diff --git a/.gitattributes b/.gitattributes index 280a96c5e6a1933d4782ff2ca206012009caf130..f2593a6837b27fb1f6d5facee96ed808ead3eb8d 100644 --- a/.gitattributes +++ b/.gitattributes @@ -35,3 +35,2260 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text *tfevents* filter=lfs diff=lfs merge=lfs -text tokenizer.json filter=lfs diff=lfs merge=lfs -text trt/NeMo_bfloat16_tp1_rank0.engine filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/gemma-7b-sql-nemo.nemo filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/0.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/0.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/0.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/0.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/1.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/1.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/1.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/1.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/10.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/10.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/10.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/10.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/11.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/11.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/11.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/11.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/12.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/12.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/12.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/12.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/13.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/13.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/13.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/13.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/14.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/14.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/14.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/14.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/15.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/15.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/15.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/15.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/16.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/16.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/16.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/16.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/17.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/17.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/17.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/17.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/18.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/18.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/18.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/18.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/19.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/19.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/19.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/19.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/2.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/2.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/2.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/2.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/20.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/20.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/20.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/20.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/21.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/21.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/21.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/21.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/22.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/22.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/22.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/22.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/23.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/23.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/23.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/23.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/24.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/24.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/24.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/24.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/25.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/25.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/25.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/25.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/26.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/26.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/26.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/26.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/27.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/27.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/27.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/27.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/3.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/3.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/3.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/3.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/4.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/4.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/4.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/4.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/5.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/5.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/5.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/5.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/6.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/6.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/6.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/6.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/7.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/7.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/7.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/7.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/8.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/8.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/8.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/8.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/9.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/9.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/9.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/9.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/0.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/0.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/0.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/0.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/1.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/1.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/1.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/1.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/10.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/10.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/10.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/10.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/11.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/11.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/11.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/11.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/12.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/12.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/12.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/12.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/13.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/13.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/13.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/13.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/14.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/14.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/14.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/14.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/15.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/15.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/15.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/15.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/16.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/16.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/16.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/16.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/17.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/17.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/17.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/17.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/18.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/18.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/18.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/18.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/19.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/19.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/19.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/19.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/2.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/2.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/2.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/2.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/20.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/20.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/20.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/20.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/21.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/21.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/21.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/21.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/22.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/22.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/22.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/22.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/23.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/23.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/23.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/23.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/24.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/24.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/24.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/24.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/25.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/25.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/25.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/25.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/26.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/26.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/26.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/26.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/27.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/27.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/27.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/27.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/3.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/3.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/3.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/3.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/4.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/4.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/4.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/4.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/5.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/5.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/5.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/5.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/6.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/6.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/6.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/6.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/7.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/7.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/7.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/7.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/8.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/8.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/8.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/8.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/9.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/9.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/9.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/9.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/0.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/0.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/0.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/0.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/1.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/1.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/1.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/1.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/10.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/10.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/10.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/10.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/11.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/11.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/11.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/11.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/12.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/12.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/12.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/12.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/13.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/13.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/13.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/13.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/14.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/14.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/14.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/14.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/15.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/15.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/15.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/15.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/16.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/16.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/16.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/16.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/17.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/17.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/17.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/17.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/18.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/18.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/18.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/18.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/19.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/19.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/19.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/19.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/2.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/2.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/2.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/2.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/20.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/20.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/20.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/20.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/21.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/21.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/21.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/21.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/22.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/22.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/22.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/22.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/23.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/23.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/23.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/23.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/24.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/24.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/24.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/24.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/25.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/25.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/25.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/25.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/26.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/26.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/26.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/26.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/27.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/27.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/27.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/27.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/3.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/3.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/3.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/3.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/4.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/4.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/4.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/4.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/5.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/5.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/5.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/5.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/6.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/6.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/6.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/6.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/7.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/7.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/7.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/7.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/8.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/8.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/8.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/8.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/9.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/9.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/9.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/9.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.embedding.word_embeddings.weight/0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.embedding.word_embeddings.weight/1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.embedding.word_embeddings.weight/2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.embedding.word_embeddings.weight/3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/0.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/0.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/0.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/0.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/0.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/0.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/0.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/0.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/1.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/1.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/1.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/1.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/1.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/1.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/1.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/1.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/10.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/10.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/10.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/10.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/10.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/10.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/10.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/10.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/11.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/11.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/11.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/11.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/11.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/11.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/11.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/11.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/12.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/12.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/12.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/12.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/12.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/12.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/12.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/12.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/13.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/13.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/13.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/13.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/13.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/13.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/13.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/13.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/14.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/14.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/14.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/14.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/14.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/14.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/14.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/14.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/15.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/15.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/15.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/15.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/15.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/15.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/15.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/15.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/16.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/16.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/16.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/16.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/16.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/16.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/16.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/16.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/17.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/17.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/17.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/17.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/17.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/17.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/17.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/17.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/18.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/18.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/18.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/18.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/18.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/18.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/18.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/18.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/19.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/19.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/19.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/19.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/19.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/19.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/19.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/19.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/2.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/2.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/2.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/2.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/2.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/2.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/2.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/2.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/20.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/20.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/20.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/20.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/20.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/20.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/20.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/20.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/21.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/21.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/21.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/21.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/21.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/21.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/21.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/21.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/22.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/22.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/22.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/22.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/22.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/22.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/22.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/22.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/23.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/23.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/23.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/23.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/23.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/23.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/23.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/23.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/24.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/24.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/24.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/24.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/24.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/24.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/24.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/24.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/25.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/25.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/25.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/25.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/25.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/25.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/25.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/25.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/26.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/26.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/26.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/26.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/26.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/26.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/26.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/26.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/27.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/27.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/27.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/27.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/27.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/27.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/27.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/27.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/3.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/3.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/3.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/3.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/3.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/3.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/3.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/3.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/4.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/4.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/4.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/4.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/4.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/4.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/4.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/4.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/5.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/5.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/5.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/5.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/5.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/5.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/5.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/5.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/6.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/6.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/6.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/6.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/6.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/6.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/6.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/6.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/7.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/7.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/7.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/7.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/7.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/7.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/7.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/7.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/8.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/8.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/8.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/8.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/8.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/8.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/8.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/8.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/9.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/9.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/9.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/9.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/9.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/9.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/9.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc1.weight/9.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/0.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/0.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/0.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/0.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/1.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/1.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/1.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/1.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/10.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/10.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/10.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/10.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/11.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/11.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/11.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/11.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/12.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/12.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/12.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/12.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/13.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/13.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/13.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/13.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/14.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/14.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/14.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/14.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/15.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/15.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/15.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/15.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/16.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/16.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/16.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/16.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/17.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/17.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/17.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/17.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/18.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/18.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/18.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/18.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/19.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/19.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/19.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/19.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/2.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/2.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/2.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/2.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/20.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/20.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/20.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/20.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/21.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/21.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/21.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/21.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/22.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/22.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/22.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/22.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/23.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/23.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/23.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/23.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/24.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/24.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/24.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/24.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/25.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/25.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/25.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/25.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/26.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/26.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/26.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/26.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/27.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/27.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/27.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/27.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/3.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/3.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/3.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/3.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/4.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/4.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/4.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/4.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/5.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/5.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/5.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/5.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/6.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/6.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/6.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/6.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/7.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/7.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/7.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/7.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/8.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/8.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/8.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/8.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/9.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/9.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/9.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.mlp.linear_fc2.weight/9.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/0.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/0.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/0.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/0.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/1.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/1.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/1.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/1.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/10.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/10.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/10.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/10.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/11.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/11.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/11.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/11.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/12.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/12.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/12.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/12.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/13.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/13.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/13.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/13.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/14.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/14.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/14.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/14.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/15.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/15.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/15.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/15.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/16.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/16.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/16.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/16.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/17.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/17.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/17.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/17.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/18.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/18.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/18.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/18.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/19.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/19.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/19.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/19.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/2.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/2.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/2.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/2.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/20.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/20.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/20.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/20.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/21.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/21.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/21.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/21.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/22.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/22.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/22.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/22.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/23.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/23.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/23.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/23.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/24.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/24.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/24.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/24.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/25.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/25.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/25.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/25.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/26.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/26.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/26.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/26.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/27.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/27.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/27.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/27.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/3.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/3.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/3.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/3.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/4.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/4.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/4.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/4.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/5.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/5.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/5.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/5.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/6.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/6.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/6.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/6.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/7.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/7.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/7.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/7.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/8.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/8.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/8.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/8.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/9.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/9.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/9.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_proj.weight/9.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/0.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/0.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/0.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/0.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/1.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/1.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/1.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/1.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/10.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/10.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/10.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/10.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/11.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/11.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/11.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/11.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/12.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/12.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/12.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/12.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/13.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/13.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/13.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/13.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/14.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/14.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/14.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/14.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/15.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/15.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/15.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/15.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/16.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/16.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/16.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/16.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/17.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/17.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/17.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/17.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/18.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/18.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/18.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/18.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/19.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/19.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/19.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/19.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/2.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/2.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/2.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/2.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/20.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/20.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/20.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/20.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/21.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/21.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/21.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/21.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/22.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/22.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/22.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/22.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/23.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/23.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/23.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/23.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/24.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/24.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/24.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/24.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/25.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/25.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/25.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/25.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/26.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/26.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/26.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/26.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/27.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/27.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/27.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/27.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/3.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/3.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/3.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/3.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/4.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/4.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/4.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/4.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/5.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/5.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/5.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/5.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/6.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/6.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/6.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/6.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/7.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/7.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/7.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/7.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/8.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/8.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/8.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/8.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/9.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/9.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/9.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.layers.self_attention.linear_qkv.weight/9.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.embedding.word_embeddings.weight/0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.embedding.word_embeddings.weight/1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.embedding.word_embeddings.weight/2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.embedding.word_embeddings.weight/3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/0.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/0.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/0.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/0.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/0.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/0.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/0.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/0.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/1.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/1.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/1.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/1.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/1.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/1.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/1.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/1.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/10.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/10.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/10.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/10.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/10.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/10.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/10.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/10.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/11.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/11.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/11.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/11.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/11.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/11.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/11.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/11.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/12.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/12.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/12.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/12.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/12.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/12.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/12.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/12.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/13.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/13.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/13.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/13.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/13.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/13.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/13.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/13.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/14.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/14.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/14.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/14.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/14.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/14.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/14.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/14.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/15.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/15.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/15.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/15.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/15.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/15.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/15.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/15.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/16.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/16.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/16.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/16.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/16.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/16.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/16.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/16.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/17.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/17.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/17.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/17.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/17.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/17.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/17.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/17.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/18.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/18.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/18.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/18.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/18.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/18.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/18.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/18.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/19.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/19.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/19.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/19.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/19.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/19.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/19.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/19.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/2.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/2.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/2.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/2.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/2.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/2.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/2.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/2.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/20.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/20.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/20.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/20.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/20.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/20.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/20.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/20.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/21.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/21.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/21.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/21.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/21.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/21.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/21.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/21.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/22.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/22.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/22.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/22.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/22.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/22.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/22.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/22.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/23.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/23.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/23.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/23.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/23.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/23.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/23.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/23.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/24.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/24.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/24.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/24.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/24.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/24.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/24.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/24.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/25.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/25.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/25.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/25.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/25.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/25.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/25.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/25.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/26.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/26.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/26.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/26.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/26.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/26.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/26.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/26.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/27.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/27.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/27.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/27.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/27.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/27.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/27.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/27.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/3.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/3.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/3.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/3.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/3.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/3.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/3.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/3.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/4.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/4.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/4.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/4.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/4.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/4.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/4.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/4.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/5.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/5.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/5.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/5.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/5.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/5.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/5.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/5.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/6.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/6.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/6.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/6.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/6.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/6.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/6.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/6.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/7.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/7.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/7.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/7.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/7.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/7.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/7.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/7.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/8.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/8.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/8.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/8.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/8.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/8.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/8.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/8.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/9.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/9.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/9.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/9.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/9.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/9.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/9.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc1.weight/9.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/0.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/0.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/0.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/0.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/1.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/1.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/1.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/1.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/10.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/10.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/10.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/10.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/11.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/11.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/11.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/11.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/12.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/12.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/12.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/12.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/13.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/13.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/13.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/13.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/14.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/14.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/14.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/14.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/15.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/15.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/15.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/15.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/16.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/16.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/16.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/16.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/17.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/17.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/17.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/17.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/18.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/18.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/18.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/18.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/19.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/19.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/19.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/19.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/2.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/2.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/2.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/2.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/20.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/20.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/20.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/20.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/21.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/21.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/21.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/21.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/22.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/22.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/22.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/22.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/23.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/23.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/23.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/23.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/24.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/24.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/24.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/24.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/25.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/25.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/25.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/25.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/26.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/26.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/26.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/26.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/27.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/27.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/27.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/27.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/3.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/3.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/3.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/3.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/4.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/4.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/4.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/4.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/5.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/5.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/5.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/5.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/6.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/6.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/6.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/6.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/7.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/7.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/7.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/7.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/8.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/8.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/8.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/8.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/9.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/9.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/9.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.mlp.linear_fc2.weight/9.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/0.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/0.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/0.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/0.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/1.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/1.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/1.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/1.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/10.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/10.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/10.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/10.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/11.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/11.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/11.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/11.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/12.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/12.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/12.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/12.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/13.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/13.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/13.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/13.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/14.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/14.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/14.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/14.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/15.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/15.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/15.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/15.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/16.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/16.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/16.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/16.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/17.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/17.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/17.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/17.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/18.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/18.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/18.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/18.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/19.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/19.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/19.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/19.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/2.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/2.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/2.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/2.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/20.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/20.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/20.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/20.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/21.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/21.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/21.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/21.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/22.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/22.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/22.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/22.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/23.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/23.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/23.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/23.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/24.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/24.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/24.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/24.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/25.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/25.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/25.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/25.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/26.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/26.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/26.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/26.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/27.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/27.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/27.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/27.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/3.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/3.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/3.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/3.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/4.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/4.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/4.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/4.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/5.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/5.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/5.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/5.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/6.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/6.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/6.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/6.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/7.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/7.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/7.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/7.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/8.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/8.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/8.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/8.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/9.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/9.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/9.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_proj.weight/9.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/0.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/0.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/0.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/0.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/1.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/1.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/1.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/1.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/10.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/10.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/10.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/10.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/11.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/11.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/11.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/11.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/12.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/12.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/12.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/12.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/13.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/13.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/13.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/13.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/14.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/14.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/14.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/14.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/15.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/15.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/15.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/15.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/16.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/16.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/16.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/16.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/17.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/17.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/17.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/17.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/18.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/18.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/18.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/18.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/19.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/19.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/19.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/19.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/2.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/2.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/2.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/2.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/20.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/20.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/20.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/20.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/21.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/21.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/21.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/21.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/22.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/22.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/22.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/22.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/23.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/23.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/23.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/23.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/24.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/24.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/24.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/24.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/25.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/25.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/25.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/25.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/26.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/26.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/26.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/26.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/27.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/27.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/27.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/27.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/3.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/3.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/3.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/3.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/4.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/4.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/4.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/4.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/5.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/5.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/5.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/5.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/6.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/6.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/6.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/6.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/7.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/7.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/7.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/7.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/8.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/8.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/8.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/8.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/9.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/9.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/9.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.decoder.layers.self_attention.linear_qkv.weight/9.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.embedding.word_embeddings.weight/0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.embedding.word_embeddings.weight/1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.embedding.word_embeddings.weight/2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg_sq.model.embedding.word_embeddings.weight/3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/0.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/0.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/0.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/0.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/1.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/1.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/1.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/1.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/10.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/10.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/10.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/10.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/11.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/11.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/11.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/11.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/12.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/12.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/12.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/12.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/13.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/13.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/13.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/13.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/14.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/14.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/14.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/14.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/15.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/15.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/15.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/15.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/16.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/16.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/16.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/16.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/17.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/17.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/17.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/17.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/18.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/18.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/18.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/18.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/19.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/19.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/19.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/19.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/2.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/2.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/2.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/2.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/20.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/20.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/20.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/20.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/21.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/21.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/21.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/21.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/22.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/22.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/22.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/22.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/23.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/23.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/23.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/23.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/24.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/24.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/24.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/24.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/25.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/25.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/25.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/25.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/26.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/26.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/26.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/26.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/27.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/27.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/27.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/27.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/3.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/3.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/3.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/3.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/4.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/4.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/4.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/4.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/5.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/5.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/5.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/5.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/6.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/6.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/6.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/6.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/7.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/7.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/7.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/7.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/8.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/8.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/8.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/8.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/9.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/9.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/9.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.mlp.linear_fc2.weight/9.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/0.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/0.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/0.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/0.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/1.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/1.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/1.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/1.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/10.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/10.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/10.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/10.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/11.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/11.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/11.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/11.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/12.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/12.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/12.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/12.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/13.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/13.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/13.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/13.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/14.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/14.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/14.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/14.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/15.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/15.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/15.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/15.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/16.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/16.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/16.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/16.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/17.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/17.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/17.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/17.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/18.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/18.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/18.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/18.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/19.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/19.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/19.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/19.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/2.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/2.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/2.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/2.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/20.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/20.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/20.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/20.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/21.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/21.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/21.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/21.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/22.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/22.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/22.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/22.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/23.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/23.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/23.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/23.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/24.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/24.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/24.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/24.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/25.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/25.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/25.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/25.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/26.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/26.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/26.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/26.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/27.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/27.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/27.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/27.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/3.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/3.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/3.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/3.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/4.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/4.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/4.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/4.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/5.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/5.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/5.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/5.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/6.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/6.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/6.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/6.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/7.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/7.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/7.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/7.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/8.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/8.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/8.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/8.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/9.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/9.0.1 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/9.0.2 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_proj.weight/9.0.3 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/0.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/0.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/0.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/0.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/1.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/1.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/1.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/1.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/10.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/10.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/10.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/10.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/11.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/11.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/11.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/11.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/12.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/12.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/12.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/12.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/13.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/13.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/13.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/13.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/14.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/14.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/14.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/14.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/15.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/15.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/15.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/15.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/16.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/16.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/16.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/16.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/17.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/17.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/17.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/17.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/18.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/18.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/18.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/18.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/19.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/19.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/19.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/19.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/2.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/2.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/2.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/2.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/20.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/20.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/20.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/20.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/21.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/21.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/21.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/21.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/22.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/22.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/22.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/22.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/23.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/23.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/23.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/23.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/24.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/24.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/24.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/24.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/25.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/25.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/25.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/25.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/26.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/26.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/26.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/26.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/27.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/27.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/27.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/27.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/3.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/3.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/3.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/3.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/4.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/4.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/4.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/4.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/5.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/5.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/5.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/5.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/6.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/6.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/6.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/6.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/7.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/7.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/7.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/7.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/8.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/8.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/8.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/8.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/9.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/9.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/9.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.decoder.layers.self_attention.linear_qkv.weight/9.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.embedding.word_embeddings.weight/0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.embedding.word_embeddings.weight/1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.embedding.word_embeddings.weight/2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.fp32_param.model.embedding.word_embeddings.weight/3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/0.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/0.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/0.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/0.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/0.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/0.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/0.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/0.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/1.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/1.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/1.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/1.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/1.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/1.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/1.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/1.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/10.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/10.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/10.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/10.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/10.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/10.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/10.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/10.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/11.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/11.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/11.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/11.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/11.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/11.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/11.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/11.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/12.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/12.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/12.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/12.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/12.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/12.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/12.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/12.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/13.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/13.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/13.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/13.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/13.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/13.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/13.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/13.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/14.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/14.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/14.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/14.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/14.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/14.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/14.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/14.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/15.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/15.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/15.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/15.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/15.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/15.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/15.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/15.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/16.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/16.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/16.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/16.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/16.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/16.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/16.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/16.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/17.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/17.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/17.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/17.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/17.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/17.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/17.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/17.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/18.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/18.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/18.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/18.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/18.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/18.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/18.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/18.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/19.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/19.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/19.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/19.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/19.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/19.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/19.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/19.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/2.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/2.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/2.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/2.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/2.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/2.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/2.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/2.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/20.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/20.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/20.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/20.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/20.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/20.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/20.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/20.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/21.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/21.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/21.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/21.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/21.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/21.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/21.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/21.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/22.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/22.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/22.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/22.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/22.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/22.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/22.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/22.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/23.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/23.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/23.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/23.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/23.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/23.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/23.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/23.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/24.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/24.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/24.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/24.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/24.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/24.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/24.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/24.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/25.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/25.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/25.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/25.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/25.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/25.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/25.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/25.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/26.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/26.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/26.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/26.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/26.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/26.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/26.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/26.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/27.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/27.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/27.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/27.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/27.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/27.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/27.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/27.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/3.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/3.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/3.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/3.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/3.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/3.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/3.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/3.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/4.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/4.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/4.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/4.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/4.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/4.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/4.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/4.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/5.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/5.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/5.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/5.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/5.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/5.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/5.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/5.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/6.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/6.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/6.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/6.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/6.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/6.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/6.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/6.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/7.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/7.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/7.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/7.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/7.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/7.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/7.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/7.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/8.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/8.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/8.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/8.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/8.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/8.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/8.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/8.7.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/9.0.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/9.1.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/9.2.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/9.3.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/9.4.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/9.5.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/9.6.0 filter=lfs diff=lfs merge=lfs -text +nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.param.model.decoder.layers.mlp.linear_fc1.weight/9.7.0 filter=lfs diff=lfs merge=lfs -text diff --git a/nemo/checkpoints/gemma-7b-sql-nemo.nemo b/nemo/checkpoints/gemma-7b-sql-nemo.nemo new file mode 100644 index 0000000000000000000000000000000000000000..e4f8a39ecab7e6181f2f82b824a22487e1fa9b63 --- /dev/null +++ b/nemo/checkpoints/gemma-7b-sql-nemo.nemo @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5913b2f1371a31e7a25b5cdc2a1946e4be67daa4591ae06b4def257d474b87a9 +size 17081016320 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/common.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/common.pt new file mode 100644 index 0000000000000000000000000000000000000000..ac4407ed73692c858f3b6b624dc5b7b701b90c11 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/common.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d358ddab6b28f844cf94b3d0786a5ec3c7fb8b3968d0b59e58a1fed8ce3d16d6 +size 25175 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/metadata.json b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/metadata.json new file mode 100644 index 0000000000000000000000000000000000000000..efdcae4b720b402ac0295007ff69eefab33a2e82 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/metadata.json @@ -0,0 +1 @@ +{"sharded_backend": "zarr", "sharded_backend_version": 1, "common_backend": "torch", "common_backend_version": 1} \ No newline at end of file diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.final_layernorm.weight/.zarray b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.final_layernorm.weight/.zarray new file mode 100644 index 0000000000000000000000000000000000000000..116a370cfb4a874aac3a7282f55644defe240022 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.final_layernorm.weight/.zarray @@ -0,0 +1,14 @@ +{ + "chunks": [ + 3072 + ], + "compressor": null, + "dtype": "bfloat16", + "fill_value": null, + "filters": null, + "order": "C", + "shape": [ + 3072 + ], + "zarr_format": 2 +} \ No newline at end of file diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.final_layernorm.weight/0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.final_layernorm.weight/0 new file mode 100644 index 0000000000000000000000000000000000000000..505912dba80ebad4a7ebea4a701ff6b378a03c7d Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.final_layernorm.weight/0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_0_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_0_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..bdfd5deaa7cef9ec35e981f5fde6534cb556554c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_0_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:63f63a650a1e81efb1222bec5ea786efd0bfb9a5e80530f442bf91f9acdbf8df +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_10_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_10_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..a440038d68af2458a2032eeaf2e1fa79cf5c8333 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_10_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:227c205836c7bbcc512b59dc9008d0c412699bd03df9aebfee0b7bd3c3e329c4 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_11_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_11_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..609ac2f1bdff7c62b652fb7f1d0281a8e154bb08 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_11_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b05ac4f60393e0c09711fc249a87bc2e036047e8dcc5d63402503e7d1d662c31 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_12_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_12_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..8417bad1d79124c6d415b688488f813e9aa59250 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_12_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:764ad22a31d074c0951a6a5dbd73f033ca459ecbb5ac362236981e8fe12da56c +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_13_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_13_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..e01a6610c2cabfee5ab8f23b7be40f83fa72adfe --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_13_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e620f1086ba7f104be16365fc69e8487a32d8bd7acbdf63f87bf802447b6466d +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_14_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_14_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..d7dc479dce1cf4d854fed381732cddbded4919da --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_14_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6910bdfa718b3600854184a6b1f32a8e9d9be3ce10c17f12fe6db79120786a2f +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_15_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_15_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..9dc5d5187f1ec2891ca599e47c932ee715c6842d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_15_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9acd9fb25e6ab8a85b6ab125c958ee5480710dab44592b1fb84a4eb69872a013 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_16_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_16_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..9b41c96bd9ae3456f318bd68a6b349bcac76020d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_16_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:885ef049145a29bcdab6fefc355a8e1236bd882c226930cf530a3ba0bd4ee721 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_17_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_17_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..4aa15a759da172cd1b03640bc9277aa61350bfaf --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_17_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:22bfd3234ccd2293b4de4762d491d064b46cb4558254220f7634545715025838 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_18_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_18_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..2881e7895900a5d72eb8a991ee08c58f20f253c8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_18_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5c02c0faf91090dbe1822a700e9bc7f91f16e835f7f826eab91a0443d48cac46 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_19_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_19_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..49e7902ecb6a4b2823a4714d278459095b09acd6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_19_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cf409a5924123568716ca4dcbcd3b5f1ba9624a1713a170c91d025cb035207cc +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_1_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_1_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..b0775267108891887248012335887ef50e474889 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_1_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5e8b079d0103f30cd750bc238764389ba0c37424878264066e76bfdbe45c0562 +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_20_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_20_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..aed8bf7a92522b09c2c6d802baf18c6420c7ab0e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_20_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2188e59e72780c58a3acd8e6ff7fe03f373d14a42a0e36c69f1583b349cbc4a7 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_21_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_21_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..54c8e397876158258c7e3d55678ec2b5b76b9ca9 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_21_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a5a40ef6b59bc7e2ac745df57f28cc08ba614e13e73c12c7a16ac9e3bb005a74 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_22_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_22_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..c74304cc0514ddcbdfda7fc598edaee49f7ca20c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_22_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eb827b13ea508701f2eb587f7c075983c97a7aedac472cc63f237756c5961c76 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_23_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_23_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..5496445b8f4849c98c75f3e0dc375b647d7e3c04 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_23_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9a1927306a894bb424ec9e0ce4a672d7afb28d3bb9d5db1c6e1268c2ec58232a +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_24_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_24_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..5c68fd9e24c7965a63ad4bbcfa04638f17d6c974 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_24_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:84ab4816acfe4bf4df6814b170665c0b14ff82e64c6a3db350780a5ee58961e1 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_25_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_25_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..a1dab169defb9b542680af169499ec57bbc631cf --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_25_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e0bc28d64fcfc1620f51ba7912eb5f3e6757292e45e4256c2d11914214bbc77 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_26_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_26_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..97bd93532dc4ef87c88fdfb3f5e5e4668e1ce3d0 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_26_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a4e987570aaa427c84d57db99f8a2529370b6bb4bb6e61a5c0a280fc3f8b1f3f +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_27_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_27_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..9c3239fcff67bf15ce49e9e79e187324a699095c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_27_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:addb78803cdfa5fcc25acb716a18e71def373b29af9d89294cc8bdda95b3757d +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_2_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_2_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..7e8b538fb206407dbb80bc34f1b59c33d352e864 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_2_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ce21d6653df4f0be7621f065b58aa61c970bef1c6dbbcbf018391ba742f93e4d +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_3_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_3_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..cb6af6b317ec0581d251e39fc0540030056be26d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_3_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9e17ee9ca84caa8e84a29eae3086e03cdcd8242fa586638054520904b7d5811a +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_4_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_4_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..754f4bcfa097f57f7181dee82f61b5288cb5671e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_4_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ceab42286b21b2bc4a581e3b7a3cb3de527ddf20deea8354d87bef6ec9b8b648 +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_5_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_5_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..58e175f50a43d4c1deaeb2acbfc26921669fa447 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_5_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c10d49188641f77ed026550da10a961d3e7096ed1c71fcdaf8c4a4964dd5d8b2 +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_6_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_6_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..7f071d0d9ee8e42d35e1ac4d622c1d71a4ba7906 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_6_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f9b99c08a5951d88fddc30f73527ab22295e4bdc040febc36a47616428f879fc +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_7_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_7_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..c301c90b7efddcee8db52cbcfc196386ac6c216e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_7_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:65b2a3db54c013bd9849efb2db9c10758012e0f5a5d1f31397e97482756600e6 +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_8_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_8_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..c7ca746be773968cf1efd88a58dbbc7850baa321 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_8_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d9e62e3c64c74eceaaa2bbd8b35484b88271cfe637474693ce93c978f0bed7ad +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_9_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_9_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..6b11a1c96a58d2a4085c34fec35f3ac4e9f30524 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1._extra_state/shard_9_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0069d31c4f93e27495b18776b0c2fe67027ca0e663f5174f085b69c0cd60df36 +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/.zarray b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/.zarray new file mode 100644 index 0000000000000000000000000000000000000000..5b98056452be6adf83cd241da1380f6b4effa63b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/.zarray @@ -0,0 +1,16 @@ +{ + "chunks": [ + 1, + 3072 + ], + "compressor": null, + "dtype": "bfloat16", + "fill_value": null, + "filters": null, + "order": "C", + "shape": [ + 28, + 3072 + ], + "zarr_format": 2 +} \ No newline at end of file diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/0.0 new file mode 100644 index 0000000000000000000000000000000000000000..f8bc814af041a3b253a2ec5e4f53ba7376129cda Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/0.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/1.0 new file mode 100644 index 0000000000000000000000000000000000000000..75997332ebb809c949b18e1c8baf460fd790e66e Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/1.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/10.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/10.0 new file mode 100644 index 0000000000000000000000000000000000000000..b97e17dda22b85035580937a0936a8cdcf06dab1 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/10.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/11.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/11.0 new file mode 100644 index 0000000000000000000000000000000000000000..c2e1c8478cffafff2dd64bcd92c4bc84499b8a2f Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/11.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/12.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/12.0 new file mode 100644 index 0000000000000000000000000000000000000000..ecf22db7f932da1f530f2296309630be690e39d7 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/12.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/13.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/13.0 new file mode 100644 index 0000000000000000000000000000000000000000..e40081851cfe138a165f7d570122685350065f1d Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/13.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/14.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/14.0 new file mode 100644 index 0000000000000000000000000000000000000000..944e458050157cf36c2e621f42931b3210a8ad6f Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/14.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/15.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/15.0 new file mode 100644 index 0000000000000000000000000000000000000000..6e06dc45ae9d8b23c0d49ed4ab7d221a6bf3586e Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/15.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/16.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/16.0 new file mode 100644 index 0000000000000000000000000000000000000000..bf6f03c3a731eac6582cda771b6f2305c56b00ff Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/16.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/17.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/17.0 new file mode 100644 index 0000000000000000000000000000000000000000..0d06bc34500651a0e3594a5d969c176fafd08016 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/17.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/18.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/18.0 new file mode 100644 index 0000000000000000000000000000000000000000..4a4aa8d9b13798db406e0c2a770d99679ccf807b Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/18.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/19.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/19.0 new file mode 100644 index 0000000000000000000000000000000000000000..5674740815e6f645e0ae1db25ded475a01fd5e9b Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/19.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/2.0 new file mode 100644 index 0000000000000000000000000000000000000000..64a4549a03d1ea026b0dedf601d1929e2aac5009 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/2.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/20.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/20.0 new file mode 100644 index 0000000000000000000000000000000000000000..50027f8d268487176798a476df5cfd777533e641 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/20.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/21.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/21.0 new file mode 100644 index 0000000000000000000000000000000000000000..4fbb7c419e30ede3a91d5615f24221a4a07e1c8e Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/21.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/22.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/22.0 new file mode 100644 index 0000000000000000000000000000000000000000..f7d7ef78ceb66815f09cfa4488fe1c55cd2453e3 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/22.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/23.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/23.0 new file mode 100644 index 0000000000000000000000000000000000000000..69643deae30da8158efb433c0ed60721eb1a7aa9 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/23.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/24.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/24.0 new file mode 100644 index 0000000000000000000000000000000000000000..61d040dcd46d09279ea9792dacac4b45db1bb100 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/24.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/25.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/25.0 new file mode 100644 index 0000000000000000000000000000000000000000..633e6a3da73dfad46917f5cf83b5000641aa8b8c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/25.0 @@ -0,0 +1 @@ +y@‡@a@…@ƒ@Œ@’@O@i@(?s@‚@z@ˆ@€@{@n@ˆ@„@|@€@r@ƒ@…@m@t@@’@@U@~@‰@‰@Š@‚@^@G@@†@|@n@Z@ˆ@y@p@†@†@‚@„@Y@^@g@€@J@G@Z@S@y@‘@„@„@x@‹@O@Œ@ˆ@†@‹@‡@€@q@C@F@†@‹@ˆ@r@‚@ˆ@a@g@j@`@†@T@‰?c@8@v@†@u@m@b@’@„@N@g@|@Š@{@R@t@Š@†@d@b@€@ˆ@@ˆ@B@Š@Œ@Š@Š@=@†@s@Q@„@T@‡@ƒ@€@<@Œ@‹@q@X@‹@i@‹@~@†@…@j@h@@‡@}@†@F@X@F@„@ˆ@C@x@ @s@l@ˆ@q@G@Œ@Q@G@‚@‡@Š@x@‚@@MAv@Š@@„@‰@h@Š@k@@Œ@p@E@Z@ˆ@¹@†@€@„@ˆ@@€@†@‚@•@ƒ@Š@‚@†@‡@g@„@„@‘@|@N@ˆ@†@‹@—@Ž@”@L@I@Œ@”@Œ@†@_@ˆ@–@€@†@”@’@‹@L@o@‡@r@h@†@q@b@@u@‡@ƒ@g@²@€@„@Ž@‡@ˆ@b@Š@z@G@>@’@Œ@`@y@Ž@Ž@…@ˆ@Œ@Š@€@’@@œ@†@h@”@†@c@Z@“@†@†@ˆ@@G@M@Œ@’@Š@„@„@Š@@¨@q@‡@~@d@‚@Œ@o@Z@@@†@‘@Ž@B@‹@‚@‰@€@H@Œ@‡@h@˜@Œ@‡@ˆ@q@j@—@v@‚@”@’@Š@@q@@‚@Ž@T@@}@G@„@‹@†@a@…@‹@ˆ@‚@™@‡@“@f@s@d@m@ˆ@o@@@Z@@t@Œ@{@ˆ@Ž@‹@@~@†@k@i@K@ƒ@^@m@‡@@µ@]@”@†@„@„@„@Š@h@X@’@s@@G@‚@ˆ@”@Ž@Œ@”@‰@c@M@‚@ˆ@m@u@@Ž@‚@†@’@@3@Š@•@o@‚@–@Š@@@‘@P@‚@ˆ@„@†@W@~@Ž@v@@æ@Ž@Ž@Ž@ˆ@‹@‰@„@J@}@‹@@Š@ƒ@ƒ@Œ@‡@Œ@Œ@Š@‰@š@S@J@†@O@@@£@„@Ž@{@‚@Œ@‰@„@C@`@g@ˆ@>@Ž@Ž@b@\@Š@ˆ@f@€@@Š@‚@q@@‰@n@Œ@W@u@F@@„@‰@r@ƒ@Œ@”@r@@ƒ@g@Ž@d@‚@Ž@Š@ƒ@ˆ@~@‹@ƒ@Œ@”@V@ˆ@‚@Õ?@†@Ž@”@†@@‘@‡@Œ@@ˆ@–@p@Q@’@‘@Ž@Œ@@@V@“@0@’@‘@‰@’@Š@’@Ž@„@†@„@Š@u@@‚@^@l@@Ž@w@@u@y@‰@n@†@‘@ˆ@™@„@‘@X@‡@†@G@x@l@Z@J@‰@~@@A@x@h@Š@„@Š@I@†@@_@ˆ@˜@ˆ@@‚@Œ@„@Ž@Ž@‰@‹@V@Œ@‰@K@P@„@K@@…@‹@Š@s@V@„@@‚@Œ@˜@‚@I@Ž@”@\@ž@P@Ž@p@–@L@Š@Ž@@I@m@O@Œ@@@y@t@@|@C@ˆ@Œ@r@[@Š@K@’@’@U@j@w@Ž@@{@@r@Œ@Š@ˆ@F@Œ@“@v@„@„@f@£@Š@Œ@‚@@„@~@{@@c@C?i@‹@Ž@ˆ@G@Œ@‹@d@†@Œ@’@M@„@€@@‡@Š@^@Ž@z@@Š@p@ˆ@v@Œ@w@’@b@‹@@Š@O@n@Ž@‹@Š@Œ@@‹@•@x@ˆ@Œ@l@?@ˆ@†@—@Œ@Z@q@…@„@„@Œ@’@Œ@†@Ž@‰@k@€@Œ@Ž@”@Œ@‘@K@w@g@˜@ˆ@C@¬@f@Š@m@Š@@„@h@U@Y@„@†@Œ@Ž@‰@P@|@|@Œ@„@‚@–@Š@Ž@–@’@O@q@„@ˆ@ˆ@d@Œ@’@~@‘@Ž@h@@…@u@‹@ˆ@u@n@‘@s@Š@†@~@h@@€@†@„@‚@G@@y@i@@@d@Œ@Š@v@ƒ@Ž@d@Œ@†@€@Œ@‚@@b@ˆ@„@Ž@Ž@–@@Ž@€@^@”@Y@s@Œ@…@~@ˆ@Š@i@ˆ@‡@f@^@e@Œ@Š@†@@Œ@k@Š@w@‡@‹@r@Ž@…@Œ@…@Ž@Œ@@h@o@f@ˆ@S@Œ@Œ@@i@H@b@k@…@Œ@b@ˆ@|@Y@†@†@Œ@|@†@‚@Š@Ž@t@@‡@‰@„@Š@Ž@j@T@Œ@`@\@š@Ž@@Ž@j@]@†@Œ@‘?Š@@Œ@ˆ@@ƒ@„@@‹@†@AŒ@‹@„@„@‰@@w@†@@Š@o@i@‹@@‚@‹@‚@@@‚@Œ@ˆ@‡@Š@€@‚@Š@@V@|@@@ˆ@@‡@‚@†@@‘@|@q@@Œ@@Ž@“@@‡@]@†@†@ƒ@Ž@v@‹@Š@|@@@‰@’@‚@z@ˆ@l@Ž@ˆ@Š@Ž@‚@Y@ˆ@ˆ@†@„@‹@Œ@€@†@`@@f@d@ƒ@o@]@‹@Š@Ž@Œ@–@Ž@ˆ@Ž@R@‰@†@@ˆ@Œ@@S@x@{@|@Š@Š@Œ@h@„@T@T@Ž@Š@ˆ@Œ@‰@@„@p@–@ˆ@‚@ˆ@q@Š@ˆ@€@Œ@‹@j@t@†@‘@ˆ@†@Ž@Ž@‘@‚@h@@Œ@Ž@‡@‰@p@ˆ@Š@@Ž@€@Š@Ž@‚@ˆ@]@‘@‚@Š@Š@„@@›@‹@†@Œ@@Ž@Š@n@m@†@Œ@@Ž@@Œ@Š@@Œ@„@}@Œ@@o@ˆ@†@†@g@‹@Š@p@@ˆ@=@}?e@d@€@@@†@T@‡@~@a@†@ˆ@~@ˆ@‹@{@„@R@b@‹@b@Š@„@Œ@s@q@Š@†@`@€@†@]@y@b@X@ˆ@‚@ƒ@Š@a@}@’@ˆ@Œ@Ž@‡@†@‹@ˆ@g@’@@ˆ@‡@‡@y@ˆ@n@y@Ž@Œ@„@…@@a@€@Œ@Š@j@_@Ÿ?b@„@°@x@ƒ@ˆ@.@o@u@ˆ@‚@‚@–@Ž@„@€@o@q@Š@Œ@2@„@‰@z@‹@i@{@†@„@„@Š@‹@‚@_@Ž@Œ@Œ@y@d@‚@€@c@„@„@@‚@t@‡@e@P@…@”@„@Œ@™@r@Œ@Š@‚@‰@f@ƒ@Š@y@†@Š@@~@‚@‚@ Aˆ@ˆ@ˆ@i@}@ˆ@„@A@†@†@„@v@Ž@’@á@‹@Š@€@„@‚@†@„@Œ@Ÿ@w@x@‘@ˆ@ƒ@†@„@Š@‚@Ž@ˆ@d@ƒ@R@l@|@€@Š@k@Ž@8@P@Š@ž@Š@…@ˆ@Œ@Œ@Š@O@ˆ@Š@„@„@K@€@„@„@W@p@†@‰@†@Š@j@‡@€@@Š@Š@ƒ@[@†@Š@†@@€@‚@q@Œ@„@Œ@…@†@Ž@Œ@Š@†@‡@‚@f@Š@°@~@V@Ž@i@t@@£?s@Š@Š@„@…@…@„@]@‹@Ž@ˆ@f@†@Œ@‰@ˆ@€@Œ@{@b@@„@‹@†@„@ƒ@{@Š@€@‡@U@q@|@M@@m@ @‰@ˆ@\@a@†@v@U@‡@h@Ž@Q@~@a@Œ@ˆ@s@€@“@„@@ˆ@‚@’@o@Š@‰@@Œ@„@e@x@‡@H@|@‚@ˆ@„@S@€@s@‹@ˆ@€@‚@‘@@‰@@…@Œ@ˆ@‰@ˆ@Œ@Ž@Ô@q@&@Š@E@@w@‰@Ž@†@k@]@|@‚@‰@Ž@ˆ@€@ˆ@À?Œ@‰@†@„@‡@‰@@x@}@†@Š@z@„@|@Š@’@u@ˆ@Š@‚@†@{@v@@…@ˆ@ˆ@¯@<@…@ @Š@ˆ@g@i@ƒ@†@|@‡@Š@|@€@ˆ@‡@)@x@Œ@\@I@Œ@‰@{@r@‡@t@€@‹@C@h@Š@j@r@j@‹@Ž@Š@}@@~@ë?s@ˆ@„@Œ@y@ˆ@€@<@…@„@†@y@„@‚@‰@Š@G@‚@„@j@Ž@P@@‹@Œ@‚@~@D@r@²@‡@Š@~@|@‚@}@Œ@Š@}@€@g@ˆ@o@Œ@’@}@ˆ@@f@ˆ@ƒ@‚@ˆ@@‰@ˆ@‚@Š@†@q@‚@„@†@ˆ@u@‡@k@Š@„@ð?‚@‹@{@†@‡@Œ@ƒ@…@‰@@‰@~@ˆ@Š@‹@„@Œ@I@{@„@Š@3@[@“@‚@l@Œ@j@‡@†@@n@ˆ@‚@’@ì?„@„@ˆ@…@‚@‡@D@u@z@‚@Ž@x@Š@@€@R@Œ@Z@Œ@³?m@n?‚@Š@0@ƒ@ˆ@„@Š@y@€@ˆ@q@g@„@’@V@l@†@@@I@ˆ@@Š@†@Š@w@‚@‚@†@G@n@†@€@‡@ƒ@@Y@€@‹@„@ƒ@Š@@÷?Œ@P@@†@€@ƒ@Ž@‹@Š@C@|@„@’@â?Š@Š@‰@@‚@™@€@‚@‡@Š@‹@‡@>@‹@Š@“@ˆ@|@c@n@€@‡@†@Š@X@o@’@„@Ä?@ˆ@Š@‹@œ@s@@D@k@Œ@ˆ@€@Ž@…@ˆ@0@!@‚@K@<@ˆ@N@ˆ@€@ˆ@W@‘@‹@ˆ@†@„@†@o@x@‰@ˆ@n@@‹@…@A@€@t@Œ@‡@‹@q@i@Œ@‚@‚@H@\@…@@y@C@(@~@z@D@‰@†@T@€@€@‚@u@ˆ@€@[@Š@†@|@@q@…@‹@Š@†@Š@…@‚@@‚@Ž@‰@‚@;@Œ@‡@€@ˆ@†@´?x@l@f@r@Ž@“@†@^@{@c@„@e@Ž@j@„@Ž@Š@°?Œ@‡@]@‚@Œ@{@†@„@z@f@i@‰@k@‡@‰@b@b@‹@^@T@„@@Ö?ˆ@X@@l@e@ˆ@@(@„@Š@ˆ@Š@d@„@P@~@_@„@Ž@t@ƒ@@y@…@k@X@‰@€@‡@‰@U@e@‹@Š@„@ˆ@‹@‹@[@„@w@Ž@‚@‰@ˆ@…@x@„@ˆ@@]@‘@‡@Œ@{@„@N@B@ƒ@Œ@Œ@_@Œ@„@†@@‚@ˆ@n@€@g@‰@†@Œ@+@|@Y@ˆ@’@„@Ž@V@‡@l@Q@Š@Q@€@‚@‹@‹@ˆ@„@q@‚@;@Y@†@„@„@Ú?V@k@Š@Š@@o@Š@N@j@u@„@‰@‚@„@‰@@ˆ@ˆ@Š@ƒ@†@Š@|@Ž@r@|@‹@x@@‰@M@‘@„@ˆ@@†@‹@N@f@}@ˆ@@Š@†@Œ@Œ@‚@ˆ@@ˆ@Š@‡@…@†@Š@Ž@f@Š@Y@q@†@‘@}@Ž@†@@@¥@‹@ˆ@(@Œ@~@€@Ž@x@ˆ@†@`@i@p@¦@n@†@Œ@K@‹@ƒ@‰@{@Š@„@‰@x@x@x@†@n@Ž@„@o@‹@ˆ@„@z@„@‡@†@…@Œ@†@†@@Š@F@†@r@ˆ@ˆ@„@Š@w@Ž@‡@‹@i@q@„@–@s@^@„@w@q@Ž@R@Ž@€@m@„@„@‰@u@ˆ@i@D@‚@Ž@„@”@t@Œ@‹@Œ@Ž@a@ˆ@‚@|@\@‰@u@@@@ƒ@‡@€@‚@Œ@t@‡@Œ@Þ?ˆ@Ž@k@l@Ž@‚@†@a@@l@ˆ@„@Œ@‚@:@†@@Œ@[@Œ@‰@ˆ@’@Š@‡@‚@z@w@G?Š@‚@@~@g@Š@j@¸?m@@…@„@…@‹@Š@l@‡@‡@ˆ@%@Œ@n@}@…@[@@m@Š@Š@Š@Š@„@„@ˆ@Ž@@q@L@†@ˆ@‰@Œ@†@†@¿@6@g@†@ˆ@ƒ@‚@Š@^@‡@†@a@Š@ˆ@o@€@‚@‚@ˆ@v@ƒ@@ˆ@Š@Ž@Ž@j@@‚@@ˆ@ˆ@Œ@‡@@Š@†@Ž@…@…@P@@u@e@L@Ž@‘@d@@q@@@Š@‡@Ž@g@Ž@v@Š@@Œ@v@‚@ƒ@Š@ˆ@c@@z@@Š@^@g@†@y@y@“@Œ@V@W@Š@Ž@Ž@~@ˆ@@€@‘@…@k@‚@Œ@{@@Š@{@‰@„@y@p@€@u@Œ@„@†@‡@V@†@Œ@Š@ˆ@c@Œ@ˆ@Š@@„@~@‹@”@†@Œ@Š@e@„@¸‰@Š@‚@Š@…@Ž@@ª?Y@ˆ@‹@Š@Š@ƒ@U@Œ@g@\@x@z@P@ˆ@N@†@c@ˆ@‰@ˆ@„@P@~@Ž@†@@Š@M@f@Ž@Ž@’@g@ƒ@‰@|@x@Ž@’@W@x@ˆ@R@Ž@‰@†@Ž@…@Ž@}@Ž@Ž@’@‘@Ž@Ž@‰@„@{@„@“@Œ@„@_@i@‚@†@Œ@e@†@L@X@…@ˆ@]@Š@ˆ@†@‡@n@ƒ@q@`@Ž@†@_@x@ˆ@@…@„@y@†@Œ@V@ƒ@‰@†@Š@‡@Š@Œ@S@€@m@@ƒ@È?Š@p@ˆ@s@‚@z@R@v@@†@–@„@†@Š@€@†@@l@L@ˆ@\@Š@ˆ@u@E@‘@l@‚@@‡@Œ@Œ@ˆ@j@m@ˆ@~@‹@j@†@M@†@”@Ž@|@Ž@q@@‡@u@X@Œ@d@s@p@ˆ@Ž@}@ˆ@@@Ž@Š@Œ@{@‰@‹@‚@@Ž@Œ@„@€@”@@W@ˆ@~@‹@F@i@@f@“@†@ˆ@„@v@@ˆ@w@b@ˆ@Š@|@¥@Œ@†@‚@‰@@“@@ˆ@r@y@Œ@\@z@Š@@`@‡@„@‹@’@T@’@@€@_@U@‚@@„@c@ˆ@@@}@Š@q@N@†@ƒ@c@x@’@x@‡@c@Œ@@Œ@Ž@ˆ@Ž@@@@Œ@š@Œ@w@e@Œ@Š@ƒ@R@s@‡@V@x@‚@ƒ@Œ@„@f@@Š@ˆ@|@x@‡@Z@@‚@‹@†@l@‡@Ž@v@Ž@Š@Ž@I@ˆ@’@‡@`@@Œ@@O@Ž@€@@f@F@O@%@q@Š@t@Œ@‰@Ž@…@‰@j@‚@X@Œ@y@†@X@’@ˆ@ƒ@€@ @~@€@ @Š@‚@Ž@Œ@‹@”@ˆ@o@Ž@‰@y@Z@K@z@@@z@…@`@€@r@‚@†@@T@’@z@†@x@†@f@†@i@„@…@@g@@Œ@~@„@€@†@Ž@~@b@‡@m@€@V@‡@q@Š@Š@s@a@ˆ@Y@†@Œ@`@„@Ž@ˆ@’@‚@„@@^@„@‚@@Ÿ@@”@€@Š@†@n@„@Œ@@Š@\@@†@<@€@„@s@|@‰@@k@‚@i@@@Ž@Y@g@_@Œ@ˆ@@Œ@‡@o@†@–@f@Ž@ƒ@@@Ž@„@“@ˆ@P@ˆ@†@z@‹@ˆ@Š@‚@‰@ˆ@†@„@k@Ž@„@†@€@†@ƒ@‹@‹@„@†@y@t@Š@“@^@@‚@@l@I@f@‘@@@@ˆ@Œ@\@…@‚@†@‰@V@Ž@Š@¨@†@‡@´@d@ˆ@˜@Š@S@H@[@ˆ@j@Ž@ˆ@Œ@”@O@‰@k@E@‚@†@†@Z@@Œ@K@B@Œ@€@ˆ@ˆ@l@‰@‹@€@‚@P@‚@Œ@Š@†@Ž@V@C@@‡@†@¨@@…@„@†@Œ@Œ@ˆ@l@‚@‚@]@K@‚@Ž@@Ž@m@Œ@†@‚@š@ƒ@Œ@j@z@…@‘@Œ@@M@D@„@ƒ@‡@†@D@‚@Ž@L@ˆ@@t@„@>@v@Š@f@ˆ@~@}@„@r@>@„@@W@E@Ì@†@`@„@˜@x@…@Z@q@6@S@v@„@„@„@A@[@T@~@m@‡@m@…@D@f@„@³@@x@ƒ@ç@Œ@r@†@Œ@‡@<@F@‡@€@F@v@n@„@T@‚@„@g@X@8@ˆ@ˆ@~@…@Œ@z@i@@„@‚@†@`@A@Œ@@}@‰@‚@Š@u@Œ@Š@€@X@@Ž@r@R@ˆ@’@Z@z@„@y@>@„@‰@‹@{@€@‚@{@ˆ@C@B@ \ No newline at end of file diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/26.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/26.0 new file mode 100644 index 0000000000000000000000000000000000000000..e5e297c02ebadf492b9a318dfb8fe2d1582908c8 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/26.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/27.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/27.0 new file mode 100644 index 0000000000000000000000000000000000000000..840f63a59cbd18737ff7734672db3364fc499036 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/27.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/3.0 new file mode 100644 index 0000000000000000000000000000000000000000..865a0ade3d0015130a1a6263dac93b0277a48487 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/3.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/4.0 new file mode 100644 index 0000000000000000000000000000000000000000..bd15786d5ae843081fb3262e9128abdad742eebf Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/4.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/5.0 new file mode 100644 index 0000000000000000000000000000000000000000..1051c7781d7f4123e9ccb79e421cbca34381f714 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/5.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/6.0 new file mode 100644 index 0000000000000000000000000000000000000000..059b0f5f04cb249b5024038df742c707d22c0df3 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/6.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/7.0 new file mode 100644 index 0000000000000000000000000000000000000000..5681581eef3b9017d9be2216fb7a3ebf44e4ab88 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/7.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/8.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/8.0 new file mode 100644 index 0000000000000000000000000000000000000000..b249e44b37d8290a6817ba68ce1b008b10b5bbd4 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/8.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/9.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/9.0 new file mode 100644 index 0000000000000000000000000000000000000000..f99601e285abecd1f3a8cf4915ed4abed68d3bb5 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.layer_norm_weight/9.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/.zarray b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/.zarray new file mode 100644 index 0000000000000000000000000000000000000000..3f3b87589f26b540b976cdf5ad9d88e236a72841 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/.zarray @@ -0,0 +1,18 @@ +{ + "chunks": [ + 1, + 6144, + 3072 + ], + "compressor": null, + "dtype": "bfloat16", + "fill_value": null, + "filters": null, + "order": "C", + "shape": [ + 28, + 49152, + 3072 + ], + "zarr_format": 2 +} \ No newline at end of file diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..b0000ba8c3671c1036d95465f2f347533e2a868b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c30e682396a9dcceab963c016985827054bde0e89f7c0265a8a482f4fddbd7f6 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..3cb575d2ea8a147b91abbd9b54c2a03aa56bc517 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:02267689603f5907a185996985ccde81abe76ad92225a2c71ac9295b9bcba181 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..4484e37a6e66ec8202124f494cab0f3e065d2d74 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e9be75e3cc9c5cc74302f3398deb1298872a269a6e96dff52ce4e1bf4d6518fd +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..ead938b004706c1204e5f2c380b60b64d6cd3d23 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:515bcc18b3a273eedac6b47cf9358ba822a07cb3c2c64fa7937a2ac2528fca9f +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..b263f731134ce5141b3965388048b5353ec85db8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:457bd3ff930d5468183693b027ff4a434c5b7ca41b1c3ea8b24d7448a95302b7 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..d041d0f90fc6725be1e67e55dc489e7ca5d6694f --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2fc2705df8281fec2d41e70859b40318ec3216d86633f036517306f0b0f4b46d +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..f62c38e6627e2572135f7c40896535e363cb857a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:16e971b4aa99b2fc0e98c5ed0b372a9c873f8445323c95a9019c3f7b04ba43eb +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..95df587f0efe287dd82a5d8ec114933afcbba6fb --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/0.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:926f8ad9fbc976ca4a45c7f9a3c6e45704d95ac4e240b05ac574b5b8de843f40 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..762adb5d56c5a8a92d71ae5fa288f2975fbff91b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a73ca1b208254321dc66abe490a4c8fea7a967d4609c63a9b15c3caa0a8fc99c +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..370a1e6a2f78534ec7a1e30ac32dfccde9d1f775 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6964b969d270fae8312e104ffe7d6aedbffca7e18947d5c0b7788d76e0d3ff10 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..81f2e9c4f285594e8d72a125e00d8b8f4f49e30d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0fba84539d5fd9043fc9b783d56ba7ce821a72f279b336f0ed7016308616901a +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..dab2d918ddcc44916717190bcbdfc1f09e37dadc --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0534390ceb2029e6093fed20b562e8624e9c15190b3726ad406a86bb23e608f0 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..4dd0a8e0d60d11805f8530da18180ff63289d535 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a513b446477f3de5dcfc29cb052266ca070b81c36e40ed6c643cb85be303eb04 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..27b07e500884afa49ea576b81c68feff25a60789 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b9a345a2bc46be114ec7adaff0b50d4aaf4da90e88875a55928464a2494af9e7 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..29b5d66d5db240f792db2817ffdc897355b77583 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:67e3cbcc9e33c2d8a7791af26a05bfe0107dea4548f3701e15517d9ede9cfe61 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..327447b668a433c1fde7c8b5934b901c60d06b91 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/1.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4b7d239d14cb0456c437668d99679dc02d17a7a218b0492649b466015d0a23ed +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..683fa806e65c3f5574b29e39cba9172056ba2315 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8aba2740991191ba91405b7fa9e89e4787b52df10b685b3cf1464fc4d33c1fd2 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..4ee7b0088481e060b9e7cd689d21447631d9638d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d190cd8664e89cf771148f5c69d5d3ca08304838fe8a185ae5f02bf69be1d173 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..33dafd36c4e1260e10fae354c860c9635a67e8c3 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:49cd24f3d94b9da685254b045f5b694a3456af28ff089d7770dee77d4d984f90 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..e41ac091a01714627b32f7760fa96a8f1f90af7e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:32ab8c258e84c5644ec19a99f7e27771abc6f14751a01178443b26556292302f +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..c91364fe26ec2f9902886f66120d0ac084b67236 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a76be16ca35b7afbfad13b68016034ce4e749c1eb44457ada9615cebdf5afbfc +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..4d59bd1d1be18808d5c4826e2da53da5ccda7064 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:11bd0b731008b531c7fd979cc0890fa7e06c75616c0b89b81830f3197e5ce9ef +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..5f147710ef3583dbeeaf44c83180df3d4f1f0f3f --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:302f5b2b3b872a06936e0de2d68ec032278ebb3003d838a84bfae8459df09f97 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..5d963e3e085efc0896764ba72b11fc6d95f1703e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/10.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a5f64ba50993e7b7dba7b24dda3427fa53012233b03b601081be143a5ad01f62 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..50923ba57cfdcdb409b5a5ed6f44a7dd8a1c274b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fe614edcc883c699bdfffe4fa28d67d200bf701c2a4cacdc548cd6187c549289 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..b73911de4e22629b0d3de9c99690ee4aa85be579 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:be5f6f18cae2c6db8d0363f343ad9eb943be9ecd6612357108497f94e6a884df +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..1ff98be9714f1026a38b83a7f9d619861a893789 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:25a8a5d5f7cc32d6db32a3477d389de93eeb6a0483647fe24d0f34cb97a86b30 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..4d2e97ba7cce91d4a75d107b11ea3a7501b7263c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3a45de52e5eb26587628b8b51b93d93a950768b3e811f5b03393ccdf0c206223 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..dd1d76029bbdd0518eb2e1202db37f7e4309c99c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f72992d0f92c3fa0199c7b3e0b44eb438cd763cf1f35ebd3dc5ac9815ba905ca +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..c3b3fe1e1c11993123347130da2ff8d590024c22 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:db1310b67571af0f136bb52ed85fcfa1ca354821aed3657d57bd28c2fe147cd2 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..d551017ec769f15c6e73fd6023c10f9a5838b404 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a090c02ed1a2981f02d2074de91bae9aeffb06f30b59c7451adb7004221ff6a0 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..ba49753ff0af73cdacafaa89de4faf4e87199e4b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/11.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:37c2da3ce03911ecc14b6e74cc49f108b5fe2350940bc6d5c5439578439cf95a +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..b2134062bdac6cf5f87df7f54ce0e150386434fa --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1617eae37b93c1496ce25035bace7377f85bf31e8de68885d653d637e209c48a +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..90a7920dd0dad6c7a4708834a1f91a6854c790f8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1c1727d10fd9cd73a8917bc2dcf9d6710d7e1b7e2a767790f6bf9cd2dc1b92f7 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..a62eb225cf6a1d0c9710d9cf705e146b01fd57d9 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b9c8647c47354a310e14b85bb5d970cdb44b852cfea9a6fa2da8ad0fc51b029f +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..d638647dd28d4d8da96557b0ef5c34f6772e3cb8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8aaa34da6edd9c819775fbe8218047bbfca5b0723725bb9dedf4e7c90da2eb01 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..7345b9581af33c7d074d9fed195d0b5ae50bc831 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c9163aad1df209bb703aeb694cd60d762c11573dab7842d03e985e51d87a398b +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..5de29c60d2acf0f820447a3250572eae14804042 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e2367414e929827e614c9f6eebc35c4873a58c5749ebe1abadc3ecefa48567e3 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..07a00471712a5465e64d8fe3aa73374bad4f3260 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:072f1ce12b8038d1d42f688523b37aea800371bd32b7e101553bd512efd8e551 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..6053084e1b211c499285ccdd4e88e43042bf54a3 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/12.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bb0b6c140a44c421fca96b60e897e1bdb8cef4d86c39ab3aed6f1515320c269e +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..afd9d045ed4bbf344cd142e0c18152d57497e24f --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fa93162b0670f8e2993f8333abf77fa4dd439318553a4c6cd7dbebdda4178799 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..36aa5654f236fd8598a699d8047a33b0c88e8b8f --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d19ccd70dcdf2c107a34809a65ad8aa80cfdf830d5744c721d22a67685f574b1 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..110a71e1a014a22c0a864a858ccc2729c8e40db4 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:88fa42d0fb6e501ced70c5eff0b9a98af8612f44b734227f43045be00f7c73e0 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..bd203650a20776367d68c2b9676c25d06e76cbe4 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:44f07747c3c7453020edaa5d16519747c7458d9c1effaeaf5857962411dd04cc +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..c704786486342199c6d62245c39edf5a4ae9d78f --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ecdb7061bd49af4720ecdf8f07bf6533e6d33abdba2964bb72132a0b2ba71bd7 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..d0682ca702828e9b2ea7ca1433b67362b986b857 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:895e9c7db2087dba764dcca8102315ea34a9df649521707c5fd5ebda5a97e5cf +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..26473dbe3659ece344f89f083e89ceb355ab2328 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d8cc82021a3a3f9352e50ba53c8474ec80f8ac178de834750d294ebb7978330e +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..86567bed02185999e87bf0f65e10f739245c8795 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/13.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:abdb6d42dfe16e978d57aba40b663576c198449775d5882330dd50a3d723655e +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..5b164d72e45a2f29e1baf9e200f002ccf77d0e58 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:66fb30a29fb588bff20247cf08b55ce75689ccef2fc33bc36ff695a4cfe9e815 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..aa9c0a70fb6b8c86363f167b08cde5912b2b31da --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b3f6b7de590a7fb952dfec4c37015e08486911d87416ad12db3900f123771b44 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..3394064736689db76658d166997b6cc2efb196c8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0d5ed38d0fd58fd207135b21f9b3d424ca0c9e29279764bf6ba433f8b8b32d10 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..8840914d362ee28be68701dbf32a8f790e58a616 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cc5d15c250129ebae2e1e5499d2b12f89e809d781cc8620e56282bd5e35625b9 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..5d25e5ed988133be8bcca4ed38add29982547ea0 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fec879c56749f7d4745304c119c383f48097ce1a562ea7818eb465c34de351b9 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..81aabbbc59817ce6409d1e483984032c306c6091 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6a6109ffbb499c191d00395f08afacaaaa4d65130635a93ed3d5117ec0cace4a +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..2b35cf67a26d28cd9f1fec0eefd538f8005152b3 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:448a53b5ff63f0df7a055d69b905fba869840ea7e2ce0343fe67c0c8c24e6671 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..3099e188750985c4bf8c7b87a071bbf6f04a887f --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/14.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f05e6a6e5d5e60274ef0cdc4c0d67d990aac9f27381b75a78efb290a8ed4cbcd +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..793da7bb6a7ff551bad75ae468c7958080d8fff8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c539ce511bb3b39e00bc2e631ce5783513144550fef590309197b1ae9b6c2570 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..381413208ac1d2d3b44d047205d2dffc09556f7e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:128143bb1ae38e1110127678d5c7b2cd6fc01b1b08849d17229b56a254e1e389 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..2a8705734b6df227344cec10f9c09b2c9a24dd51 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f911c3850129275a9cb37232eaf651e36a10161cd091290c112368a95bcfa68 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..ed6902cfafb86944d1d3c57a0d3665f2789be584 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a8d645370973ae17930c136c8cd3509ecb678679eb31de5ce8b40cd23be41587 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..313986d2116fc86a8829b40e71ea5184abde5088 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b77f08894157ff3f16bf3c98d4c230a3cdbf99b9afd3eea9b48cfd391698bed6 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..271c2912fede044c8ba8760cf1660adfea3cbcfa --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0b71898c7870e93d431a4577a559d370a81a2dd1f69806aab4662e5e120f5013 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..3706cfd66f913e1ca9520d265f2404ee3721740a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:86de29338b918a1372a55b1da3b9898e96d5b8f23f0eb7b5eb953f2bad0b0130 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..54fa9b35151547bc184b3583090d67a20d499469 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/15.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f1e20f6b3f239fae5ce59f49b3ca7dd23bfb9bd14b6e7d40544caec9d5a61502 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..e0790911abe0fe5586ece4ac451fb705e390c850 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b25ce96f7a66dbce6eafbea3bd8331786a9c89ce38ace7ff6bc354e9824d3a4b +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..9bc804784a4b5f563a2d9cb97aac5b8196839662 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bc9577f3fcd926618137ebba456099d35140f1b37284042c522a6df033e528b8 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..adc60b9a274bbe750022e5b70a096100d0bb7bd8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b3b284952982af4d421c9706639225fd7885270f488ad60ecad7a3daad1a706b +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..2f7e795a213d447ac8b6ef97a3d9209a5628d488 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:70968d637cb29d3f39f62467c33d483887ae7fab7773186d86b87cba9603433b +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..4d4b2628f5c4bb9d806c3b4f45d2163526c54e70 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:18861adb13de39b7ac79f5e2ea8e0cd7da90b9b84ab05ff2052fd92a5665601d +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..ec13f868a45830ba86ad6f03bd9cd04368f6e041 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a1c9c560231576c7187624486bc1577d6358de43a3f177c6069ee3a605e9d2e3 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..bc3230365dac59182f78d3cced56dae1db8c5701 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:313623d819342220de947bde1358ebfd3d43fa4d8655b4b401c5b41b90bdbdd9 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..e24128b9eed2736c74c2e152dbd80e3aff181e8e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/16.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1b04b4ff8c8d4d9761630544d3bd1864e505c3636ef38af9842033ed470af88f +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..53deb1d890a24e5bd8b888ffe2a90d9487312396 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d7fe39b63d1f44b628cccd16387af0e244e4dc1cccb67b2259ea22595014c834 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..25e53474a02bf808787c475279721faecb28d9d0 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:093679fdb32972ff6f484868ca57b4d4f7151aa471deb49ed30236f002855a0f +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..fae06a95cf3035baae548a4e2e2c0433cfcc7e64 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d343aea2483ba20e851a3110e2a5c1d776016c2b5fbd1bb83eb2148d1217f49a +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..5339c77b756fc6bab627cff893641735c3dab69c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cd2bef4f742bd74f8facd610d9ec41c265cc61cda30f4b8bb65d635b4c4c784e +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..2c5a073f6c02fd88baf0f2cbca1f142e6827d92b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d7c5cccc6d968dfe033c0c0e68c966d42a1a2e6daddf9084513ef9941048db5e +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..5fc6c7fec0bee0f8be8279c96042e8d73d7138e3 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fe00e69c6060f5fea28e032facc66db1a5df2e18feca767a0f3b9052e90766f3 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..779f286f43bf4f4586acce2e7184f99607fa5091 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:baf466270c0946b84adc175d279ac9fca449fa2486805b7443a2fbfe6dc23ce1 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..3d1111fc4a5e19f35b6ad39c873f9af503bbdaa4 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/17.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9f17b5180788a76ffdadeca1d71f030cdd0518c1d679d8191c39f5690b077fdb +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..0508c7c1614002fceb4d4e8b853114798070b815 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8ee1d50802a703504bf5150f96b7bfff44e96340e3285c86c210e33ed09d4dab +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..2bd18319b7b01e3b725ed959138d105c66d39d54 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8de19ed65905918335e2b4678eacbb7df03a6de4db35a1923c7b2cdcf1892afa +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..f7442f387e641c6a151cf71003f02fc178475d1e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6950d0f37f004d534a6b75062c6c743923779b06f0b964b034db6e28c2bff0c8 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..044aacc02fc2e54a1f66754b5473de89e78ba1de --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c9a00bb3ddd8c4c02e3cabd8d7d228525f543547d802b4a7aaeccc71ff433a30 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..9663f14d0a162827a9ceb12f98983fd10a9e2adc --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:09cce326cbcf088cd4dd926ef6544e9ea89687db323b77c44c3d580277577510 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..5b7916aab16d98424e698ca44864f36e040517d8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2c1bd7f4564166542cb6b1b16e5d045ae753a7f24490325a8dd46c87f91f0825 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..82e4c99e73e04e852f2925111dd622e4e265a5d6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:23af0a5c8edd5aec4c822e82fef09259d9e13327442bddf0c2ba47338b839103 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..3284d1fb32fff0514a27521aa755c12c21889119 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/18.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:06c701648e1ab0b82b0af63c74b7470ad5b370fab9ac681ebf94baaeb5456853 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..e7817c1d1cd4dae32cd945429523ac1e21ac3c96 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8581bcfe5e76e1cb7c647ddeee3ae3fa0a1793a1fb0cdda9161839fb5f45fb6e +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..5590435624e4ad069bb1ef892c31ac9c74b6de1e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f272314fe102e25bb9ef21c51adb56dcd25afc31a1942d82ec856757edd13500 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..defd2b69966115410abd988e3568f43e01b2397a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:487d6becd66b08fa3a3891ff53f212e25094b30d856267e6807925efc94cf724 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..4be37884f7090b6b2aaf965cd40f5ae740697949 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:deab60010363cc58dc97f9ca7ddc2d98ebac5ea76c0bd6c598cd52bececdb201 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..6eb11dca6513e61b61b85e9fbe00e188530c92e9 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6d597b1275358ed24c613a254ec98fbc530e7c89e6300131514a131e3ea18b69 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..21fc2d5ea67526857235febb3278f519d9e505f4 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f475c49b3b3938794f4f8f10a35543ca6e22a0661c43aead11805cb44490ee0d +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..3cb8329caf69943b9c09e3164da12b3bfdaad9a3 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ba1194e804cae2edc0c7e1e80459ce93c120098662c6ad612012e88fc0b769e9 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..37f6deed964eb241e1c8c98944fd7307cad6ef8c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/19.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fd5ab5c51f0333bd5a44dd37469acb40971ab1947b1da5bce38762bd67214291 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..f07d187ba06e5166b9589ba8daeb15a87edef6b6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2f1ca43b2199e3e8b65ec2062c0be3ec8732b381b1beee856baf4de9b38dd760 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..7e6c7ea46acabd540d370b8be4f4dfeaba9a399b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7aa2db0b57ad3a3d3194ad911a62f4f69322463f9ee3ab7436cf708c295f4551 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..f703f4bdf106b5b1d56a237f27fbb0b849d56757 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:94d66df9fd1b919b57adf06bdf0ebf7bdeef160ca85f60368fccc4cbda6fe0b6 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..cc5fe9038b5a2e997622caa3421f8016da3171e8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ff3738fb6cc702d57d4b5516a13b8cb4541863c20b466a7feec28d7c4eaac4f0 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..6342d72cc49fcdf37ee044189e0c7c3abb11dbef --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7e4c449c0fa811c1c56032b1f516f24647d19a03bec8150e69f2ce1b812914fe +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..1ac01e1ee7d3d706beef54ebd506d932098f95c9 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f93b99847129d5c7dbfeedd25d6e78a0f53d97eaade1897707ccdf10426fe5ec +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..7c9a068519c999bf61c76f61fa70807ad562e6b6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7a81abb908eb1e9af6742ab01264591bef4a16ede3deeee622b437412f61bca4 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..5c75c39eb62f4039821fdaacc2a355cfc0d83255 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/2.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8a2d98abcabea11378d9df9b36ddadfa4c442916aed231d407690d9e2fbc941a +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..77262e86b700f4e7feebaabd9b69bd34466ca845 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8cca168249b2ddf9c392c85dbe1703e95dd9a866b1cefa808f2eccca0303994d +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..c507473da5e5c20314807fd43a455f130bc4a17f --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c8461788f28f7eed367f0e5ff0b3e14edda50273fa681098b1cd76b568fa2f72 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..a75dec25f732d6d014b677f78a08285d1d463dac --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d3917a582d5397682d95239bddb8e3bc6470055905a358c6b39a088ca4d16423 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..27d4e9f5833afd20e0ed6f64ace0123beefd6fb1 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3a1026a80a424be36ee69ab9d6490a60d5d441bc6b81ea879f879e004d716ba7 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..8220b5dae66451718121c8b2ce9f9de563251da8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:426d2a56feec2c79b95cd44345bbe144fa8c3d6536378397ec9581dea163c830 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..cd148984d8e62c236f084842e1a29070925e2782 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:370d61699cff6f959928280944d6e86fa4886c5ca5f19677a0d2639545649f1f +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..88fdea48059404ba51251cd9873609a012fda1c8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:581122de9b086fc000176558fad7a75a56972841bc439784a05965f90c215389 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..6c6665cddf56124db5f769eef0fb5186299cb505 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/20.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a857488d34833a851adec69e29a6d0c908ea9077900f6df404a2fe44c80845a3 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..c88e14e21b485a7a8c919d3b2f7625a10a74441e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6e6960ce64dd7d41a80d06b20a513e67e668c2db2ba56bdeadcc0db50232595e +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..4658932916276d1cec4794b4bb4964f615cbe55f --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:403fcbb4a5521a780c48da66f7a48377243d17d37a677b129d1167f6b0e8c39c +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..a112c787d1054e7fbdafa3cb18a7b4af99744935 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d0ca0ba9ed9bb03ebcd911f944af0dd3126d7444d6de302e13757f1dcbef16a8 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..b439e0a5966cea2d25a7977370c3a2cfee970d8a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:66506d0b8c2a3ce890aec27c84a42d141170306c8d376e76f3b7e07919cf3eeb +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..74c5ba2fbac08c23ebf77d53ae7b99eeec86c1af --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:747af4125062c1bcc9171e1e971bc71dcd8fa67c4ec7093af6eacc910912ec90 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..1b293c5675690c754f395d8cfd9c9b2c951c996e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0f83f93e2df088dc2c974d8091699ec19ac1b63012fd4353e8abbbdda1d35a2a +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..cbd08a9745d8bca7243edbd855cf343f011f326f --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:89af0609148c53bcefddcf7b8aff736d07c139cf988a763e1440a3a0c68c5c5d +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..010ceb222f5e0ae3313cb896a4bcb8e774e060fc --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/21.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:072694ef53d468226977095b4551be3caead0f9a2903b6b6e5d46bc38691573f +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..4c317fd6c3479689b5fe708386df56594e8e677b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:522d120ea63b11f9419acfbcac24d555255283c0677005a2bb4796512ec0499d +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..7576460ecb6c2b3349cebe6493b77759628223f8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fe526812d6e177758dc3e9a13e2dbd0fe261062b539250423dacb1233f55f514 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..9c5f05cc1813a26d5107400c266120ddefcad3fc --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cb9af9ec0f707fb33865e99feb45ed4767a032c6ef9296e1ae323bb0a0c8ce81 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..3b1e49e5665a7cbfad8926f7f11536d1dce2ac4c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a9099e0dbacd22917d5a0e63584ade3564851369d13be4e80f173ca94c1b5929 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..a39c28d3e26817fabb5cb35f1e3d2a98be294a61 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fb408fca3e2fc3894d8c3136b3af70f50f8db9881ea17feec06a4894d721d37e +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..a1756a8730608eb2201ca2dd1662086c2349ca49 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4d1628a7fe2bcdfc371f161992c52992f1b780d6cee8f8200194c870c9522cc0 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..125beaada186d0a1e2a9056330370ba67a17ec0c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:928b17ae32c043be3edb5753db3517089601b0d8afc9410310b919b51fb5ce17 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..63982901847c09228288a84033f14635249f6718 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/22.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bb0557afefac1241c2e5b5b028213ac3ddf5aed09b6972c211050f2d7cc50913 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..1bb4f2a7487342dfdd9ef40550380d9af2773fe2 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d81d450ff086c4a20d1fd16d1c0d6fe8a07eaa21047e2c6048bf48d01bef7555 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..4f66a4bcb38f0fb6ad5503223925265703ce33af --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:205df9a1f7e295585babd65921af9c4b36b1dcd4f3e20a2431bc6907f82ad514 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..bdb84a9f323a400476d7fccf56636fa5232198e8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eddf154fa00f4e90dc30cb2f4d205c940cb06d9aa51837769270b29c229f8b7e +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..9d31111a4a1c78b78316b6156dc7bfb00ba1d1e1 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:64ffeb4c6fdb4bb34b788c984496da9630d0cb722b332eabeba364169c831b30 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..5aff2a5bcad868e80df0a0bd87e1d39b98bfc9d2 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ca9acc31576d6af2e57516845c345b490bbe79e875af6432d9c52c8a09740541 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..58b2f0b46e7fd1125ea3a27070bcb684834c6fe3 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d362e1f86a8408d3ee8bfb178aecb1fa90dfb9785a4ce11ef9995055e35ab737 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..d90eeb3e71c853335126033b25d89b20724f6d64 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3c53724a508134983112ee98c2a99a8c6839fe8a295eadb084db40156563aa6c +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..105f214e803d244536988571d44465e8e8e34fc8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/23.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0f56ca6e98d18f5828a58f65c535c24272b8e9b82ef589b9f6d9c53f2324ac16 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..fecebf5b2d3f3e7bfe72bf8cdefb276d8ee08792 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c174c9bf658a1b0d79ccbc2711eae0ef7243c9e9a50db155b659179db5be6507 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..b9178cb4da3ca31dc0a3010414f8beb3e21f4ca4 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3453034ff231ae8e30e4be7668997d3c6db01ea567271d6501b074c96559b2e0 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..1f452453e59d724bb2c18d39ac052e3fc33181a5 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:24fa9fc2a016279f85c7f7301ed3493d9af19bbcbbe307f52e54621a668ee21b +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..9c9beb184efaee9ae8211f5062342e70b7b5bdea --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c25ebc02e92bd8a82bc24d27f9efb5ea6faa5bc92e7b9d96a5a8b226a60e6d56 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..35dfd7fd6d35e84c253f40666a42c73fad648e9b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4b3af8bf83bcccdbd872f3fbc3a221d1d993e11983355617cf758d5ff9f68ef1 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..5cc9e591e383bdb868b180fa60f324b6af715e97 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6a920fa6b3a900189cd1de215d8d09abc6921bf76d1f38fb0731290f518fd622 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..e31f8d50cb1700d4c20d94811cf4a6e4e3f5c27a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ea6847bcf67da2a5428b131e1e4b62abc9453d1fe26a6ed42c5076fc42462f30 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..1f03cd358ab164d240460c11c80224725e4b485c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/24.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d77c2c03f20104e80c29da9346bdd038575fa602e6eab1bc45185c8358cf2f44 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..bc55e58c49621dd9211855670cf3f473d7f2fe62 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7436acb0e0b19c58a4945817ffbf25922e443e1e8015351536e6ab284ced3b36 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..44182d3e737831e305ed8d36522b7476b39be63e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bbe4f1883bc407ab618ce8aa563a52a15c62ec21aa36b7b7c4125c603f0c8e3f +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..16375ffef63cd4f83f58f1eb476a4949db3e65f1 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3f98cae2d0869bcfb8c2e841d0f74d8f11a8bcfb079f8e7b1ed8d9d3b5826533 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..ba6894af1c92c146ec9a76bb4839a632ddcbe10d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d1de2f30ae6b4f98c33acc14e2440b0eedc48090e95b5c1f28ddba7131d44bed +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..10a604dff75d0385054b894665fbe41fb12bb85b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:77c1f1727c7eee9ca812e07ec4f70557ad472d8b97f083fb228df54212cd884c +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..85567d5354c9e411d87e552bbfb6452c6ecdf4cf --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5bd5d8624d2153fe552433e06ea4ee0fd20551fbc1696080fb7516afc55255f6 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..86e45a7b8dd0d8ae3ebbd7d88d1fb3643c37c50c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eeae56aac892357b79f1dad93ac23621f5ab04f70a3e62174625918d14f28db3 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..9679fdea1c5db1a5a7c8fc5cef20264c3785d6ce --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/25.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0614f4605e34cd6f80b5840d1feb7712a84eda9921bb01f5d05d0516e8690ddc +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..f59d115551f0da1a91329fa2e09c78795b377519 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e2666654c3910b7447c69435393d0ac0adf58a4acc3fb5fc46e266fab9c93674 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..29222085418239954859648d14498f80f133d1c3 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bea1c473a6df63360037038a0222b2e266fac5b893a62fe432073c89a7cdd5cb +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..d1f611e0e559f3947e7732c1c6c8660c088297cb --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7523fbf841bcd4efe3601c5602a1a1c0628c6de898b38e5a6e1d0e96b5b77626 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..dd8ca7aafad1ea3387808354d41aecaa3e459663 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8f425d4866bff1d5277e74f26a831707fd054e8aef5f4f27c262df665690acbf +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..5e1b24e88d8f67f5305673115d89474592f08037 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d1103bbcd2d36cdc6b8c5bf7356991940d6e3cde7aa882824735d2b23d8d105c +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..a414b0eb3f530bf6962167c0827450ca3547f5ed --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e77c3cc8f721b82cb8d0582c21fbbf3549563ee4037127b8dec4f4fbb43e52d +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..c62a63ba72fcc076c5ac2dab29180ed8e55e99d0 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5c8b13afbfdf62c4be2a202198147cbe7b7b8336a3b9a29fc2c36c9f190bc841 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..237588eeaeb2c83c437ea0f4b6697b5057c256f0 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/26.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:876f26594a4d9aef555b611e2ed270f1c8d601fedaf1eaeea278127bb1a2ea3b +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..ee70176e2d4ec836e12297179ed9c94a157afcbe --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3675781946d73ba2180c5ad9be4f7aed12deb8583b952873fe3739700236fa2d +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..cd4e0844fdf4c75ea6b5f9de049625e0a4ade753 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:51bf5c9357b890962588d2bc6e7cdbefb59e53b18611089808db69c8436d53ee +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..35b489dfb9cb179272aedd28390b1f9e07a61301 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1005ba60cd6ba940f515ba4dcbeda0e63bf1b7be767c0549c7a36e2a73abeaa3 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..524c6f579df0f8c996ff738cb7657308fb951808 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3e8beda4700eb7b72f814812159b84c1bb71628fe133dec96dbb227691a29432 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..7e9f477c9bfe39cd585ced999049693702801be5 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8f6fe48627aa2f9b909403488b4d7947f036b541d5c5616290fa570fab6c909f +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..e77cec63a4127d29bbf373e1b6e113f32f022e88 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b34b8e2a214f7d3f210e2163d6b659db24677715d0a85260e4c84ae2ef14e2ed +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..2bf09360db4a2e1253efb4ce4514b3f5e6c02947 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e9c7e63bffac38bc9d01c5af26516c941051d1e090dad905f4509e51ac7af8d0 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..279c7c8161c477c9b15c2e4bcbb3b53244d1c0c6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/27.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:33052feffbe386623eae3fd9b9f1bb769dd817322c2ce00dc3073efc3a5ce2e7 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..2700f8aeeab1daacc66f1c4b3307af4a95c1e478 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3377eb740b160b761be97f39264e8e18b5b4e6cad90c794ad522acc844e6d904 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..3cee5927f9aeee595e73cad2ca36b64df228a7cf --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:584e1b5120afc8b7186c9f46c151983947ada866dd793e101f901d0eadb66453 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..5c39f5b31bebbac1e43746e2650ff06e4dc51c05 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7d16091873cf1aa2d47dc8407902fb896883fbf4cdf574a30c83851d397cff30 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..3b80a60dcd91004f22e49c747a627e907fd906bf --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5e86aeaf2213480f77330bbe0c4f7682f68e2af9192d97825dee38b6ae9a65e7 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..de79586424cc1353945c52570085f09d90bb4b03 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6e8c47aa773d10c8ca0c3cf407773217e5e07c80a6e5ecb623f757d7250a76ca +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..7eb5402ef68ce4a7e6caf6d199f6e7ab585d1675 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:32da27ce896ecf94e1b8f7dfebb7fe175269f918f186f1bde45872d06114a278 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..cec870a28b620bfc76cfe8bfc5471a2073f9797b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7b6347d233bee2c8f4409d7c3d8e4e5f6f59b1b3465f091c4e3bb01e0bdef2ab +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..0dd052dbbb33db8681b42027957a8cfff0d0081a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/3.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:149520750c728e6b790ceb6ab40e2a3c8ea4bac347c101dc9091290f9538087b +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..0838da0115a9a2c35555fe1847d8312e3f3bdf1d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a49d6f81f453a59ea1e2bca64aa3e118e40c96fdb5319dd0b015d432c5cde738 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..9acb2e60a589a348800a533a6c71a4b57b91c105 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d608ee5fe5eead59e449724a0847f3e1aa9bf79387390fde6ab5b2f2a95a3808 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..45dc8bb1ed63b5bde858dd6f4e46509baa82d059 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2109f39a205b83573575a95f7da0afa2ae166260ea4755dde5667fb678d7f18c +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..fa844cd5e47cc28d4fb91bd4556500a66b20b5dc --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:980b998b7ad856fc65460b22cb3595334bef9e57eb42c17e7468fd3eefee26d6 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..3ceb48105fd80237b186892271677746aad58594 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:034291afefc55cf5cacd0901b2bffc877832acb2832ef2a77bfe0172d8be9253 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..2a4d3b70bf764b2db9ba5aaac5f99d0e88f69be7 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:feb7a30b6c04c618c8f70575215573fdd9a0843ca33a15bb481a0791743e74d9 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..74206a61d0a0c0a40fd2c808b7cf088db4ee887a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:983b5754137a299c68bd12f45a809f402c5a651b771ccc0512998ed438b9080c +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..cf32e76e3bef70af805840ea7f7506c37fb38544 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/4.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2a2ecc93742c0aa6b55e307314d834bd29288a55b476914c4f9539e4d387551f +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..a765ef922ef350333346f0ac8730282193d61640 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d756892c06c4f6617c1e4e228ad95a2c13e65d56a24d86f8a880ec77f0a650d1 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..a2548013a90d5ec59fb83be25d25ab076ac8da14 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:48137386cb0b2d079cf1b0f93d9e15c320c192420031358d6cbcdcbde923cdf4 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..fbf8ff27306a59567388bc388216a2672dbb8ac1 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5f523f1d6d120c3f2295c6b94a21cfb06914dc1b820d3a747fee9c356e902ace +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..61d815047a983755455858e70643d4dc9e3c3581 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:30606a327d96a912c03ef7bb43c0108094ce8075707be45d215299ec38e4eedb +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..b9a3b72d09fa6fbce9e9171f42c0943f0f92dd73 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:35de93e4b802d69064452059a65c56f3784313127b4081e8021a7c9ead1dda36 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..a3edfecf02d47bff7257992354746e2bd72d8cd9 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e8fd0f7b32fb44cd89f3d3b2db8e6b27b21c458725c5a84053fab4bcd8c587c7 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..707d8960c80551944f8f68e851fde406efd1bc22 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:11ddffe346c0a260f7a62753f2f3007c307f772da5c7df1e7508acfdfc5becd3 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..f5486acb6d7ea6ee6927e0ff5a404942b9473700 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/5.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f42ddec0562ed24b195e2e8acc67054971dfad212b31c6e3d87a09ea8bc2e4af +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..7f17969be36727aee6efaf347cb6ddc0e3de69c6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ae0e35c684677d9047ae641ed685b640d98afd82f273f948b62bbaa02228a65e +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..2e41fda30536697da652f898f95e4afbb387cbce --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:02251218597bee042fca3dee99cfd1631a5b67c2acf59a081a75b922a4835cbb +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..b65acd302ce6fb0e8695ddc9df2c85f92d9590e3 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:373b737743ab89314f56f0f503e8f971e9d2a54ef6ea4609bd195802ddfd9d32 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..dc412b78c8473195462c5f3b6487ced690dbd82d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:92e83ec1515fd044fcce968462b869a0ec1699c01c00fc35a92569fa3fb1ee07 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..b5cbb8eacd755167e04aab406879d822c0f45ecd --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:72392232917d08a485c41a6fb5dd5fde76034e3454346e520ea8636fb03ec604 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..15663a80f95cc6eae5d6e30ad2fcb5fe12b2ca33 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ea786ade01e5a76017acf9590c48bee056e11f695c42bcc32d547ca678cce15e +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..3e31203ef48c4a46914c520fad839457d170a2f7 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c0b35896d9ddaf411b76c7d3fe6eb2c3efedc7f71fbfc5b2a3bebc2aefbaf7ff +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..7c545a8f5ed3573637bf91154b445a12072dc543 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/6.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9f1b6a785dd016af2e916fe563640fc0fba503bfb08e11bead1b28e9b6dd6ea2 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..6ee259a147df53f351a7fdc0fcf9a51310ae7e1a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4927425234061c2a43be4e7c46d5dd51984604dde5d4bfda6acfa985ef49a5ef +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..81cbb7bdb00d1ec5078b6d51401641a1c5a307be --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8efc67b6dd20bba8e9e3a35eb4e64a223a06d5dd8bf21a8324958c5bc33cee2b +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..40720f4df923a6f82cbb429e8cb1d7b727cfe44a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:239af9882779f78daf08400d8215a71443006a16746c01ebe1b6f0a11205b8e7 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..f228853f42a3e014579d3ac41054a8fb346b6c42 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7baa63cfc584e30e3a4d4636989d92b465ee3041c80c734dc767e31f2b2115b4 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..07f1335ff096590e297aefaaa2085daa5225b88a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:68eab4b316003ddfd6c5d671ad7a2895141d4013c97a0521cbf962a687e27501 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..201e8b0b2407c2c641d0c9fc23997cf7f26cfeeb --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fc676d827508716b263257b7e2cd740acb7c972d5a15093c4c532b40b0702c35 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..3ba1da38ac90ea44163b4c92d3f50707eb06d0f4 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b2d4eba0f03e4f13fceeb8790c703775e9d2e3d866b913db16e0cb1db40f0552 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..a56b6ff8cb4c28b2ab0a134fae9afe29c6079642 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/7.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f1e1dac0b89f9b4d78a444d3e76278f0aa1eb7796b02ee5352d327d7cc31c81 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..c8eee00f1e6558dc1e1364873b791a116af9cd70 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b08635108278d325d8b6ef6261bb0e700608faab38e084e587d5dfb7f99e9e3f +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..92c89f786c4c1b364bbf97c46f3fbcaf52c3ecd6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5ae91e6e1898f525c49bba39fb54ea3ff7b4bbeb6c06d3433ba9e442b0398881 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..57002be9279cd543a321ea2e1c2c73ff8b3873cd --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:51650884713baa7999852a8a2d0c083a7f579e3e3dfdfa6cb82aa807e800d3bc +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..eea175abbcf2a68d79feb7c56b4981deaef2f6fc --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c02c4a8ce4c35ddcdd7ea822e1b9a5e9c02197acde13a2ca364acf9e5074c2c8 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..e833b7259755e81d394dc7b0e7be311f5937a1dc --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:06c9027bed7329207a57f2ec7a928bf46f33624fb676466a742e8117abb53146 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..4ac0f7c25aa4cbc2c05cc4823afa6efa834ef89e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5678d0ddb4b3b1c605f932307fd4d4defb4f2ac3e5bff750f6d36e36a22618b2 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..0b9f12d51f2fb613abc82c565c4f7dc0140009fc --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:02e75a76741dd5c313a07f3e9f772c0cb7cdd01b9c32bb729220d97c1605451f +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..b3a39ec02fde3ea41ce5bd457016c81948f783cb --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/8.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cb63518c3c07636c924d09579a86b87d22bc8c79488260f163d7e0329ffd79f8 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..8b3db9e51da040f4acb7b2e8401f64bd94bc9a86 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7e7e4de519ebc8b527cc60d356597ee52cc5eac008679a7915293ac16a8f0893 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..f75d5527aede754854bed2496e45e4da0d6df22c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:255e3d3574154dc4c1e4586022ebeb89b1f9b8eabb2932bbc9d01f08be98b79d +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..9c8936f1c8543d2d1f68a953a033210b678aef00 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:653a785c418bec3295ccec4fc4a5f3d3752813aaeedbfd86c9852c98e304c1db +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..1d4b69f0711bc93d229b6cc0c36dac1a81912bee --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a90af8a3fdbefe15d91096771282b16c903d421e2a1ebc9dbf171599663fee51 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.4.0 new file mode 100644 index 0000000000000000000000000000000000000000..5887336defbb86e76f3b283f0fb59cf988371815 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.4.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0592da422eb203d0e1b1d8fa24ced8d29359ab9fa6245054ea49ee698b4446ba +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.5.0 new file mode 100644 index 0000000000000000000000000000000000000000..4a2ed14d0a00031903603f766acf06f127b024ca --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.5.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:547971756d3d3cab3b6456c2e210aa69b3bde8ef96d16dba810420535c65bb36 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.6.0 new file mode 100644 index 0000000000000000000000000000000000000000..f4516ab202f5ab63323bc4ad680d6bc63e7de511 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.6.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a9b7d2463b783e72788539d09b8652ded2ab5a393b87f50ce05e69646925b9bc +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.7.0 new file mode 100644 index 0000000000000000000000000000000000000000..509da5cc9e8848c32d87ff60072a399919813bad --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc1.weight/9.7.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:93ca594f7d221a878457eae107b7e66d5085fdb0bd31934d2a5b2b95458d7d0c +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_0_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_0_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..bdfd5deaa7cef9ec35e981f5fde6534cb556554c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_0_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:63f63a650a1e81efb1222bec5ea786efd0bfb9a5e80530f442bf91f9acdbf8df +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_10_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_10_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..a440038d68af2458a2032eeaf2e1fa79cf5c8333 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_10_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:227c205836c7bbcc512b59dc9008d0c412699bd03df9aebfee0b7bd3c3e329c4 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_11_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_11_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..609ac2f1bdff7c62b652fb7f1d0281a8e154bb08 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_11_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b05ac4f60393e0c09711fc249a87bc2e036047e8dcc5d63402503e7d1d662c31 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_12_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_12_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..8417bad1d79124c6d415b688488f813e9aa59250 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_12_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:764ad22a31d074c0951a6a5dbd73f033ca459ecbb5ac362236981e8fe12da56c +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_13_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_13_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..e01a6610c2cabfee5ab8f23b7be40f83fa72adfe --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_13_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e620f1086ba7f104be16365fc69e8487a32d8bd7acbdf63f87bf802447b6466d +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_14_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_14_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..d7dc479dce1cf4d854fed381732cddbded4919da --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_14_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6910bdfa718b3600854184a6b1f32a8e9d9be3ce10c17f12fe6db79120786a2f +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_15_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_15_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..9dc5d5187f1ec2891ca599e47c932ee715c6842d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_15_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9acd9fb25e6ab8a85b6ab125c958ee5480710dab44592b1fb84a4eb69872a013 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_16_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_16_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..9b41c96bd9ae3456f318bd68a6b349bcac76020d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_16_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:885ef049145a29bcdab6fefc355a8e1236bd882c226930cf530a3ba0bd4ee721 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_17_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_17_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..4aa15a759da172cd1b03640bc9277aa61350bfaf --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_17_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:22bfd3234ccd2293b4de4762d491d064b46cb4558254220f7634545715025838 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_18_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_18_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..2881e7895900a5d72eb8a991ee08c58f20f253c8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_18_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5c02c0faf91090dbe1822a700e9bc7f91f16e835f7f826eab91a0443d48cac46 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_19_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_19_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..49e7902ecb6a4b2823a4714d278459095b09acd6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_19_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cf409a5924123568716ca4dcbcd3b5f1ba9624a1713a170c91d025cb035207cc +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_1_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_1_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..b0775267108891887248012335887ef50e474889 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_1_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5e8b079d0103f30cd750bc238764389ba0c37424878264066e76bfdbe45c0562 +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_20_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_20_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..aed8bf7a92522b09c2c6d802baf18c6420c7ab0e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_20_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2188e59e72780c58a3acd8e6ff7fe03f373d14a42a0e36c69f1583b349cbc4a7 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_21_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_21_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..54c8e397876158258c7e3d55678ec2b5b76b9ca9 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_21_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a5a40ef6b59bc7e2ac745df57f28cc08ba614e13e73c12c7a16ac9e3bb005a74 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_22_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_22_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..c74304cc0514ddcbdfda7fc598edaee49f7ca20c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_22_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eb827b13ea508701f2eb587f7c075983c97a7aedac472cc63f237756c5961c76 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_23_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_23_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..5496445b8f4849c98c75f3e0dc375b647d7e3c04 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_23_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9a1927306a894bb424ec9e0ce4a672d7afb28d3bb9d5db1c6e1268c2ec58232a +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_24_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_24_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..5c68fd9e24c7965a63ad4bbcfa04638f17d6c974 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_24_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:84ab4816acfe4bf4df6814b170665c0b14ff82e64c6a3db350780a5ee58961e1 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_25_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_25_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..a1dab169defb9b542680af169499ec57bbc631cf --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_25_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e0bc28d64fcfc1620f51ba7912eb5f3e6757292e45e4256c2d11914214bbc77 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_26_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_26_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..97bd93532dc4ef87c88fdfb3f5e5e4668e1ce3d0 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_26_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a4e987570aaa427c84d57db99f8a2529370b6bb4bb6e61a5c0a280fc3f8b1f3f +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_27_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_27_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..9c3239fcff67bf15ce49e9e79e187324a699095c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_27_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:addb78803cdfa5fcc25acb716a18e71def373b29af9d89294cc8bdda95b3757d +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_2_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_2_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..7e8b538fb206407dbb80bc34f1b59c33d352e864 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_2_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ce21d6653df4f0be7621f065b58aa61c970bef1c6dbbcbf018391ba742f93e4d +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_3_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_3_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..cb6af6b317ec0581d251e39fc0540030056be26d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_3_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9e17ee9ca84caa8e84a29eae3086e03cdcd8242fa586638054520904b7d5811a +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_4_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_4_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..754f4bcfa097f57f7181dee82f61b5288cb5671e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_4_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ceab42286b21b2bc4a581e3b7a3cb3de527ddf20deea8354d87bef6ec9b8b648 +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_5_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_5_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..58e175f50a43d4c1deaeb2acbfc26921669fa447 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_5_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c10d49188641f77ed026550da10a961d3e7096ed1c71fcdaf8c4a4964dd5d8b2 +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_6_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_6_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..7f071d0d9ee8e42d35e1ac4d622c1d71a4ba7906 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_6_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f9b99c08a5951d88fddc30f73527ab22295e4bdc040febc36a47616428f879fc +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_7_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_7_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..c301c90b7efddcee8db52cbcfc196386ac6c216e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_7_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:65b2a3db54c013bd9849efb2db9c10758012e0f5a5d1f31397e97482756600e6 +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_8_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_8_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..c7ca746be773968cf1efd88a58dbbc7850baa321 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_8_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d9e62e3c64c74eceaaa2bbd8b35484b88271cfe637474693ce93c978f0bed7ad +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_9_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_9_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..6b11a1c96a58d2a4085c34fec35f3ac4e9f30524 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2._extra_state/shard_9_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0069d31c4f93e27495b18776b0c2fe67027ca0e663f5174f085b69c0cd60df36 +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/.zarray b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/.zarray new file mode 100644 index 0000000000000000000000000000000000000000..7614c2228d0ff8117fbcb68016694d2d3fb66f53 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/.zarray @@ -0,0 +1,18 @@ +{ + "chunks": [ + 1, + 3072, + 6144 + ], + "compressor": null, + "dtype": "bfloat16", + "fill_value": null, + "filters": null, + "order": "C", + "shape": [ + 28, + 3072, + 24576 + ], + "zarr_format": 2 +} \ No newline at end of file diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/0.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/0.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..02598172abe0283459ca821294d368d82993c14c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/0.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a769e50fc98333a327732b1ecdb589f90026717503698f03e2361c62499ce97c +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/0.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/0.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..882917638af1493abf27ad39d1f1ed8f4fc84be4 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/0.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:173dba916b83617776f78f75f538c204db0f334693a58975fc9dede4f4c54914 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/0.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/0.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..b0d4e54ebf5177fa47ca380c5acc8faefbb1648c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/0.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:65967ad06ab8316802dfc02f05c8d1a6e1d18083429ffceea252ed433770cf73 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/0.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/0.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..a9d92366f3a984a585d75e9b069be4cc3eab5b28 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/0.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:53cf3ffa008f10725c44e1a12608e558fb519aed57b6c8396646ea29db83e330 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/1.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/1.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..abf96a640f79173a5f70f75e55677089b9bfea7d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/1.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3fd84c40b18a6aa74efb9964f0ae3253e2e6755b3a23cd53866bc5d9dd47205d +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/1.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/1.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..405b0a58c8a3e767d0d5cbda83d7b52a2c5400f2 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/1.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:355e2b355457ac59b1bb42c4e053beb0aa3cd9428b649940a7a7f77b6351cc60 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/1.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/1.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..c09b62d3b808e2a4db18b335085765a9c36f3f76 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/1.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:16ce4efcc5c398b30cb117d7450183f0db19267f1320c5572d311c1634b16f13 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/1.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/1.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..2ffbc2233a9a500ef9d8ef44ac44a95acfbc791e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/1.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a1fdb77762337a3aa2ede002bfef6372bb46b3271ecbea56b9b79d1eba4175a0 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/10.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/10.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..d4234f3aaaf564cdd01c5bdcb77f06a70cd74a44 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/10.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:975af6301f28a4d10378c658c5a6e3772f69025ac3f99b7fb4288d5ac2c3d25e +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/10.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/10.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..68d944f6b9060980c5abfbbaf7a09f02358c4500 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/10.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:deb18d6dff51553b721e8736760f28d576d4e152d96d3335467a93ff72b0ae53 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/10.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/10.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..cd2184ecb64998ae47475d06b81113f93b90534d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/10.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:851247e9dc939655790c44ebd91084235f10ea3d87a079bba66f3e925e267942 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/10.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/10.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..259de00584db8e483b1b82a25ab60067dd9ceee5 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/10.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3903b756320dcf99405aa318da7ac79d5b12d03a9a6a5a0988ba8e05f5c93f72 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/11.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/11.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..df19277fc76055712920e47c02720ece557fc031 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/11.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e78a5767bb2676f5118add6d5d6a83b43133a1516043d2bd80f5642ff9c0c9d +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/11.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/11.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..5ceee5681756f7ec6c9d5d0ecd172b7b11ac6e32 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/11.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:15c33e63fe6effb3211bd7cae0bdf39fe77f9bcd55a9e19c4653819e62b0bd4f +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/11.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/11.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..0414802a14d9a047c101763ff31673a2997a11bb --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/11.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:aebb357ac26acc35d59d6318f5c25cfd6d756a89d127df8e27dc685bf49ffe62 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/11.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/11.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..289c091b2335907ac62c215488f4a46311cf895a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/11.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6aa5e6e61953f8b4803ceeb23f7e61836878145a13bf6647e5bdb05e1f71f6cf +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/12.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/12.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..23b1a94332206f7472721b7cde64115abc211511 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/12.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f62837e387d1149b9e380de4a8f1b77062d148d9b44e5342e6400098b86c34cf +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/12.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/12.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..400712ec28c4a46a4f58c71951f2d21c702f84bf --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/12.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:47490e5d08a22703a0c1399c5b2db24db61a10c0633dabe16debdea26b6fb7e3 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/12.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/12.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..67a2cf27a962d1a85bc5804a041a1512d6bc38f6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/12.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7f6d92dd3c395e11ae6dd4d5b3712423af35275b1d9276470a541fa3f8bdbc81 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/12.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/12.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..306c1c8b039abc5668e53e9dd0789abc301e6068 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/12.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:35b359221e72ea23f27c78c5dd92dd8469cb32ea778905c109a737b1a9b77ae4 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/13.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/13.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..f9c5bf0a81f26f1df5d6957c39f0e4d86112ef49 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/13.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:979f44731e977fec84e9d3a2e433d9ed9c981a3dc61e935b6a695bd61e771872 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/13.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/13.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..2b7902da8aed1a359925f67688112f4a99773a25 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/13.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:500bb1005e9d514f2711c95c89faf8ed54d1817745f53d1999f6d4fa7cfe20ce +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/13.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/13.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..e890469ca3f5ce8c7246939e1460dc2c3f815012 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/13.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:86f3fe89a01d50fd74e03a8d75ad06195f21c50acd48c5231d1514c9256c6932 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/13.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/13.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..a9ad1e127084b60b2383b8414cf70914f2f94b79 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/13.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:96ede3f4f06920bce7d458bc1f3a53b7e7e44387ff5e0c85a31bc7604ac43813 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/14.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/14.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..ffad042f9971a7deba34cd4185d7beae1db7cae2 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/14.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7e3dec206c967a1d3c975ad7aac11b43e4ca1f1baaceb2837eb42fa27d78b348 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/14.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/14.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..a411a267444341c969cd6f91f031276e8f464025 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/14.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7f090fbc562e4f68a0e59995f229691717d5a09036e0f4b2b840fde57e97e880 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/14.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/14.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..5adcd50c6b2c283ad987266ffa3748feb3cb8026 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/14.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6982b89a5c0ca2712bd59a2302396a8aedfd6e001e59f2194e00673c7535a29d +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/14.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/14.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..09b28ffffb35f3c6b29dfc37e6e6e27aed71b0b4 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/14.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:16421e912493edb4f41aae8c1ed557d2d89831b7c192c414d1fb22a3acfabda2 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/15.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/15.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..5f69b60b025b554bd7a6b54123a3a52c0fe5d567 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/15.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b2ccda7e30fc260264f3b946033e93e7a86a83d92070df8b6cebcaa98dc22e21 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/15.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/15.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..0179a0f121e7f2953ab0406f26a8b6aab1324348 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/15.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:45092094aa585a128ff98646fd68a20f4fd361625df868d6055b3a84ec0dbd8b +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/15.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/15.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..aa21fdc4a6a69b8ac50e44cfdc6c728779b6ca19 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/15.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f087d40f953d5222fb2b52750fef37273fb6840a2cbb8df8df8f6430e494821e +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/15.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/15.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..a6ac828b6b91a02acf7eb74a3028efa9c7f761b1 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/15.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9ca3a72a8c56a7e88150cf914c53a55768fed3edef251838edf5ee32b23832e9 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/16.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/16.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..69d7590d00177cfd3b4e560d2aac576bc3b7d26a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/16.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3b668e1629b28b97e31431756b12fa38c07814070f0ada1263278ba59fe5a1dd +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/16.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/16.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..b9173c231c4f48ce5779de22dd0f96c8db1b540c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/16.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:714334ea3bc9397ec9bf2060d05d8e90e1ae3f85f401de65cb8033fab93479ff +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/16.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/16.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..75cfcd2df4ff768d8a1c107bdfbdda681cec34bf --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/16.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eb388266e3f8c6c51425e7225b1d3bb08201307edd68e8b941ffc41ed9a47375 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/16.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/16.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..06e59b9cb34bcb43b5e451a6f1604ee672bb462e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/16.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:411bccbb1c73ec144966028c4a18e2ac340f22a289ff3aaee0dc68dccd2376b3 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/17.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/17.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..5c2a2facdcb5a34fe0d542579f3eda0d7846ac74 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/17.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d500ff85fb1590ae820fe346c412c283e239de1c398146f33790bde8ad7b749f +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/17.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/17.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..1c2699a553f6939aeeee1326627f7433ed53d32b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/17.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f710a8056c311ffe6c627bda0277b9e27342030ad3fc463c391871d621ae6026 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/17.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/17.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..06848a2953b32ba15cada15979e91ab9303683e0 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/17.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:05680fa21febe3e233fc0e925e5354d1a0385a58b34c3db161bdb005025d40d4 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/17.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/17.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..4df64a7d6b81176a7bb09345c790cb086b52df8e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/17.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:03a0a7b0bf3b590702d454c1a86f6f405682c1ad5107a708d32848460e452c5d +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/18.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/18.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..c75d8d1058d9d295b712123116bd24c736fbfbb8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/18.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:10444a55fcb00155737b95836eac07fe0e1082fa8a9fd1607f6c54a94ea07c61 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/18.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/18.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..6f619d5a399c779ef8ed5cc85b52d772e2a5ce2a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/18.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5e3596740df644bf53c0f1c72c88c1ce9301a0d325d1af006669c4e92f05cb27 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/18.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/18.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..94d1d6c265f6d969797470b4f69643883144b1d4 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/18.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:385a410abac3a34ee2befd85a4e3b195ce4f88af801dc3a8ec408cc8ddab8c75 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/18.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/18.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..8b5a88556e566884d62ec4cd6d1429ae852a05f4 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/18.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8b9ac1a0b3fbba5e4a60a3318daf0f16f899ed21b894fdec0de7477df771ac55 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/19.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/19.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..b34d1bbf7f899aca440febf4ecc4d038d42d9bb6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/19.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2874590e1489918e646509b98398e6b6987f054f6350b5ed91b3796b41276cc9 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/19.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/19.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..65da4b0050dfb3247c7d82e7b634cb7430c58f4b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/19.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:91b88644b21ba717598391256ad7b73404c1aa36f5d6714e49dbabe07341ada4 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/19.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/19.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..948869a19b1cceb27c988fea9152252b6cf7987a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/19.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a604f7e985bfed8699a81a9e2cf1b6aa8eacbc0f8970fa09ad9533dd921f43d9 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/19.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/19.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..c9c41d3dd2d7148ba142dde4eeb8a2c067503c80 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/19.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:19efc54a7d56a4d4356d99988c8ccd4d8f366d7190c59a7317207cada3b45644 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/2.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/2.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..2ec873c56a2a1e593401f914fbd5f84a038d27d6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/2.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:088698f4c5de929277c0474eb7b48f4de94187ffe62bccd9d4e29c242f02d7a8 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/2.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/2.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..ecbd1965ee6032cc22644903c7b938523adf5fa6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/2.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:14a109ab1231de5e257df92ca6227f6545487a8ec9f0c0ac6e78b4f9d9e64abd +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/2.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/2.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..25e98c1ca2c1831e5208648f2a2d07afbd46178c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/2.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:035c2b3a39f06f9e8a257c9a292adbcc841cac9adeb8c4a6bae14a4b1472338f +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/2.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/2.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..2288c1cbe0116d3d938fb40e817a598650634a98 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/2.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:310ccb7884cfb42a40d10ddb8136bb8c2060bf00035b96856b0653c5686386bc +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/20.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/20.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..9dcab6dc5f857d3e259b50b7ded015d344fe8102 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/20.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3c05762efda3b1f1dcccb07dedf50564e4aa202d348800c590366eb12bad31d7 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/20.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/20.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..77021a0d2a77125132ae92f8d0646ffa27bf513f --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/20.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:df0a1177ad03d3ecd4a336b9516c04ddecbb0c04b7ce51bd10567034be127f69 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/20.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/20.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..8f7441ec18af9faaf38f124862d904d1c9ad5ad2 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/20.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:be6ac44f4b132039661d16c22d7a78ee9746bc6116ff9dcf65bc1c547c49b646 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/20.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/20.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..46b751b91f52f653f853a6fe248e92a9db06b5f2 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/20.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6b20930e30958383766476c1348e0f0a04c864bd4ade7528aa86728cc13f9979 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/21.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/21.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..ad30041b9676b9a165580db19073204b22b2be82 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/21.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a13c08bf35de415cd1aa7afc6847e736b9caf72ac7f39b65bb1fd98a68f79ccf +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/21.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/21.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..acd8657511fd47e4058c5fb627c8ab642c1be852 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/21.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cf1b1a65dd1be679c84f88dffd5211e30e11885cf1430415ee533c0d34fb7778 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/21.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/21.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..424c48d1db886d39940eb68c3060556e0287566b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/21.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c92e651161091b036ba4e5656a7658ceed21fd945393328b2c89d0490f90f2b2 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/21.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/21.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..40bebdb9ed0069faff251eed28834b56c01e9d05 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/21.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a52748122586cee18f5dd6fea5b8b239ce7310be35307ea6773ce01fdde63da3 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/22.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/22.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..0973c047c9e166ddab0ba692aff148d8d74829b3 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/22.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:816ab1dc2cbfdefcf8375dc58674305f18bdabef3b0ec591058e50778fbc5f49 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/22.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/22.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..6f1f90f1cbaadd3f498fb7c5be74a983061583d3 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/22.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a4f764d78bdfcc956ed52192757315528c86a39001ed64c8ab943edb90381e43 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/22.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/22.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..741dcaefad361b39bf809f42ebbc0dde67cbbf32 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/22.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dc222c236fbe6f99a900de097f4d07d265a2dc65679ca3bdef583a5416398f29 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/22.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/22.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..f05667de88b21078fc2c6fa5036263cba3a69186 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/22.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4fd755ca06662c767dd6fda6e20a9a7db89a0ea4fc11670100ac665e9b674f02 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/23.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/23.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..ab179a9b2932ea38d4cc0edecbfd585c19ebddbe --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/23.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8633bfbcd0b43af78f79ebf7ebb4108af1475f62b5332c6fcb50f44882b0ccf6 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/23.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/23.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..7947b6dc26317bfb9d572a74a2164fe2d90a20b3 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/23.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1e75e45867a947553bf66918dd6b9803ecc5d2f66a8e68ebc20622854bafad86 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/23.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/23.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..0b8c0bfac006e8b89aec06603a9f72b4f22bd2f0 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/23.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dd42476d855f692b8bbc12cd2eae3edb35f46a1ba6e0b1e47e2cf3ce30bc060c +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/23.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/23.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..151177d1cf5c7a3b873b8c44c711a5d6f4d9a0e3 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/23.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:05d2844ffc2c0fdd8a9a2cc029d6d8b879cb608002de6539a5bdcc3e938fdcf6 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/24.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/24.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..974fd870eeab6c3311f8eaf1695afcb26f7bab44 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/24.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5d52e935c8da5919ab37c3709dddca8c1a8d7e8c44865de8173df1658ddf48fd +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/24.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/24.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..548ba039c0da788386e55b8fea56fd6400de42e6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/24.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:33afadf8fbcfa610dd4d52566bf2d4d439d7de2540f0d78e33de1063148188e2 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/24.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/24.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..d8a3e28f12794469126b177fd5e4cf5c24f6ff2c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/24.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bc5efed8a3d08709c5e68da1757ed5b3318e20c40c937ab0bf9cee2493c4dca0 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/24.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/24.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..a72c1484afda31b8b0e35dd63b39285c960fcea5 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/24.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:050b96f914d0f7e9614992af9d853dfc7bec75b2bf46a83a64a5f9d58c705351 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/25.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/25.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..71cdc4e2afa748e82d8231e25e98bd472a847620 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/25.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8d203e86905799cd00069c2a58127ce7677eb7080c1e7f21c7198326ad255875 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/25.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/25.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..bc67273330993e2c7d69c46f130a8505f02c0e10 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/25.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:09d707f1f29dea0cca1b8843e9b8c07f2a6e8924402cf1b6436233d45e397cb9 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/25.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/25.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..2ccb2d2290bddf28652f2bd35bd8d404769cb4fe --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/25.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:87282b520a1f2b4f02a628a32ca1a840655fcbfaf4bc75dbe1cb8f72e851ba86 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/25.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/25.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..48a2e06b317886c55a0c4f4dc94c493c01ba1f40 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/25.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:43c7f850f77186257ea61ebcad61785f4dc659f9f8d7ad6ccead2877edea5009 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/26.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/26.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..502aade4bb60a827d0304c18f3b5fbacd0f8acde --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/26.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:865d916112466951a698521cced859ab63119b7e57d509cbc902f71f9421bbc0 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/26.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/26.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..bf254160712147e2756395daa4374d16fef6d9f5 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/26.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:752c78c8442d5e2b8e57e015e9bb552e6bb2e08471e548efc58969147f9caace +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/26.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/26.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..a4b62fed87e214514c6dcf55b56daa5068d4e4af --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/26.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f3af9e7c596e438872bafe0e3f8837cbad52dee6b824decd30299a97edcbfbaa +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/26.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/26.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..248155d20a451cea1836a9482d947e55d2412f22 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/26.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7cc3b30e5852e19b2b62f6099c5655a16ed882dae24921b5529c935528596801 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/27.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/27.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..0dab8e36cb840139bad90253e4c64a369de87a12 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/27.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:05bbbf6cc670d9dfca9414d78256e740a09bc1a949a5a2bf7715211fb9421dad +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/27.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/27.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..3ec06d7b3eaf52ea9d4b6ca89f2f7bbf0722f8b3 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/27.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4400ff63837e7ac8d88b4c2c17532cc0d434efe1da257bee295126817e0a979f +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/27.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/27.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..782cbd7f7b6070c570dac97a7fed6f8b19f32a97 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/27.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c2861b5a2a5ffe2923be25728ae38c9c53d5069a9e2181f115c1cb6a8a9a86e7 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/27.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/27.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..1fd04b50c3dab4811fdfd22fd5311370258ef3b2 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/27.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7d136343edc397e37ed3f9605dae58ba603edfd54696f2f6c4093f5620c20267 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/3.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/3.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..7c9a9bb4846ac7c588a68d67463bc1b0437305d5 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/3.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4f79ad83a93620d11aca297552498ed79dbac92a749d5bc7adeb7741e9aaa481 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/3.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/3.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..db7eab37ebeb14605c1ece3af4b6745e8a2a4c0f --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/3.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fa18989637b85cb681f7dd05864a7b963a4322dc189d9439cd93fc09499015df +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/3.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/3.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..f2f9e075406a633918c230116d152aa4ee4760ca --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/3.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9ca95252ed07d24dd2d80cf999ba2384197a66374b824c703823d493df7d7033 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/3.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/3.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..5dc35638fcecd699ae4ee9d61f0a658fb5b79a20 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/3.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2994a4f60dbb472a23dfa9492f31c51cb7912a8fd7bbd3b92b54c8ba492027e4 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/4.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/4.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..3d2347af5c8a2d37d679ca48c225299e84b11699 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/4.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a053fb75b91c32c3a6c96f76a18a7f4f0a6e8371d2ed3633a307b8746bb489d7 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/4.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/4.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..9c50ca3cec0158fea7d4dac3f8dab80d237ff3e1 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/4.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e74d5d53cfdeb8e5a5fddc61aa61ca3ba11ced2b57c1e04ff78e7496f4f6eefc +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/4.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/4.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..cd3ace01e91e66c1b0edbeb1db1630eb84471bc6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/4.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c6bc795d9460ece8fcdc8ccd720f3d3e34e4fb76e3cccb2347f7b66a796002ee +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/4.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/4.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..6485bb39a0abbb57740764375dd41cb4ec58b9e1 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/4.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:75cae4474b4c4f4fe6e18227d84905b2e8ba9c430957f2e50499dfe17305b711 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/5.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/5.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..ab35911467afee22bfe39e7d9eb15e2a88d9235c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/5.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bb4d6915c64178555061c33b129bfaf9ede2cfc08f5dbcb5c9b711d31074fcf7 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/5.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/5.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..e112ff93b7f05f09a82c308ad81859eb5fec6483 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/5.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e88f3fc97a2ddad3f6526b4b370c1b65ba698ee8e43f7374739fc6eb39f2594e +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/5.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/5.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..0aac8ad74f26debb3bc61e1ee359f5814eca288c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/5.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:29bfafa6014e7aff5b01307fbcac14137e11f8afaaf61836c48e7e0232cd8173 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/5.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/5.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..ae739a847b7d8128d42b94eac023939f1c823ced --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/5.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:84efdef526d1af1cf6be7ffa49e3509f2cae1a545811bb92c13adb518c9e7039 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/6.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/6.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..01215c3139c913253f65a07d6fbdf1b861785aa0 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/6.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5982542ca4606f9e51774b4c2d988d2664912451b0c13b5f1b0514f6db821d5b +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/6.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/6.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..6f63b3e07e314c7c3b6e9139098e24642c9d3cfa --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/6.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b9f37ba7df20e74b689cc3ecb7dc360d190256df2b4fbd8658badf60000a73ae +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/6.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/6.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..1e82d220d12de392c02504f876ff31f8be9b7352 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/6.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d983c671982fbec648ab9747057f32cf76c8c6b33690ed900aae9c2151c7e2d3 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/6.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/6.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..4cfe59a829ec43d78f9967ef76ed0971d9df7149 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/6.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0d83b98a2af249831e278bc6552394be7ac6f748bef8e526b9e8eb947e99dc9d +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/7.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/7.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..7e969821e947f05f3e326e82ac00dc2dc5f2f966 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/7.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4f50ba101cb0712637543f3c029e028e2b9dd4c7541bf5938afb08ab7bc50ecd +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/7.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/7.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..9b1e826611c007323c604f67ab104cdf290ff827 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/7.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b751032a34777567a1cf5b2874839958036edbde414ac09e167833d63056fc7e +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/7.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/7.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..3da43667d0c89d1264aedd4ad89081f1b06c55b1 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/7.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eb497bbc3789189ea56d1ce6a57f4754abb666c4811ff9feccc0f6f0bd9485ae +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/7.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/7.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..a43ed7c8489d381f1398abfb7151487a07e42b9a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/7.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4fbf5fa2102c37f9e3e4a1b2dcaca0c21fbb5602c8fc8a3ab474d7feda5ac69c +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/8.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/8.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..195ac44742b0702e3b3ff18de887108859b9116f --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/8.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:716786305994d8c7bc301605b8f7181d227960ec07102a3dae468d21aefbc223 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/8.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/8.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..5199d05e09bb17f5d799ab1e294eb24d3a92df1f --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/8.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:75e8bf48c972b82c1035f1b34125473c56ea2731afd34d28cf41f2d4ace6994f +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/8.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/8.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..958aef07142260d3fec2631fa61e219b27f4cff2 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/8.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:72841f49a2162df9953d2e54c8c9e8fcaffaa060c2ad694d25a2be951ba72c32 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/8.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/8.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..2b1d0806d75b1cfeeba0fd47db7e8ab4d9592a1b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/8.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1ae76a0487276ca83406dafd1879d9ebe4320427111f54d3febaab5ede122d62 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/9.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/9.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..946bde2d6896191317139ee22d50f1cc43e6b23f --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/9.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:17c13617c6d8e9138b6302195947e3c9dc3882a86a5ce31228915ec07cae70f4 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/9.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/9.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..e1b69916036b84a2dfe4b429f5fdac5e20217638 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/9.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0d26af34282201117cbb531ab78dbcb987aa19a80acd250ed95eb899e617f2d1 +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/9.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/9.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..44a7ddb381eaef590928c78c9983c356e4f301b5 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/9.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:06a9b89ebdb0f5fb961ee7b97553631b10887b8d8bae8cba664e4987f06c43ec +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/9.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/9.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..390cca72c1e58a33868359b68132d69226c5fe6d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.mlp.linear_fc2.weight/9.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4aec8c9e83dd7849c33991489d2d2542950e0241e175a28c27572fd791feea3a +size 37748736 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_0_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_0_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..bdfd5deaa7cef9ec35e981f5fde6534cb556554c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_0_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:63f63a650a1e81efb1222bec5ea786efd0bfb9a5e80530f442bf91f9acdbf8df +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_10_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_10_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..a440038d68af2458a2032eeaf2e1fa79cf5c8333 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_10_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:227c205836c7bbcc512b59dc9008d0c412699bd03df9aebfee0b7bd3c3e329c4 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_11_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_11_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..609ac2f1bdff7c62b652fb7f1d0281a8e154bb08 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_11_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b05ac4f60393e0c09711fc249a87bc2e036047e8dcc5d63402503e7d1d662c31 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_12_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_12_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..8417bad1d79124c6d415b688488f813e9aa59250 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_12_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:764ad22a31d074c0951a6a5dbd73f033ca459ecbb5ac362236981e8fe12da56c +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_13_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_13_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..e01a6610c2cabfee5ab8f23b7be40f83fa72adfe --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_13_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e620f1086ba7f104be16365fc69e8487a32d8bd7acbdf63f87bf802447b6466d +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_14_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_14_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..d7dc479dce1cf4d854fed381732cddbded4919da --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_14_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6910bdfa718b3600854184a6b1f32a8e9d9be3ce10c17f12fe6db79120786a2f +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_15_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_15_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..9dc5d5187f1ec2891ca599e47c932ee715c6842d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_15_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9acd9fb25e6ab8a85b6ab125c958ee5480710dab44592b1fb84a4eb69872a013 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_16_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_16_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..9b41c96bd9ae3456f318bd68a6b349bcac76020d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_16_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:885ef049145a29bcdab6fefc355a8e1236bd882c226930cf530a3ba0bd4ee721 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_17_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_17_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..4aa15a759da172cd1b03640bc9277aa61350bfaf --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_17_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:22bfd3234ccd2293b4de4762d491d064b46cb4558254220f7634545715025838 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_18_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_18_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..2881e7895900a5d72eb8a991ee08c58f20f253c8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_18_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5c02c0faf91090dbe1822a700e9bc7f91f16e835f7f826eab91a0443d48cac46 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_19_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_19_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..49e7902ecb6a4b2823a4714d278459095b09acd6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_19_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cf409a5924123568716ca4dcbcd3b5f1ba9624a1713a170c91d025cb035207cc +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_1_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_1_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..b0775267108891887248012335887ef50e474889 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_1_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5e8b079d0103f30cd750bc238764389ba0c37424878264066e76bfdbe45c0562 +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_20_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_20_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..aed8bf7a92522b09c2c6d802baf18c6420c7ab0e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_20_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2188e59e72780c58a3acd8e6ff7fe03f373d14a42a0e36c69f1583b349cbc4a7 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_21_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_21_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..54c8e397876158258c7e3d55678ec2b5b76b9ca9 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_21_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a5a40ef6b59bc7e2ac745df57f28cc08ba614e13e73c12c7a16ac9e3bb005a74 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_22_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_22_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..c74304cc0514ddcbdfda7fc598edaee49f7ca20c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_22_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eb827b13ea508701f2eb587f7c075983c97a7aedac472cc63f237756c5961c76 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_23_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_23_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..5496445b8f4849c98c75f3e0dc375b647d7e3c04 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_23_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9a1927306a894bb424ec9e0ce4a672d7afb28d3bb9d5db1c6e1268c2ec58232a +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_24_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_24_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..5c68fd9e24c7965a63ad4bbcfa04638f17d6c974 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_24_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:84ab4816acfe4bf4df6814b170665c0b14ff82e64c6a3db350780a5ee58961e1 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_25_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_25_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..a1dab169defb9b542680af169499ec57bbc631cf --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_25_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e0bc28d64fcfc1620f51ba7912eb5f3e6757292e45e4256c2d11914214bbc77 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_26_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_26_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..97bd93532dc4ef87c88fdfb3f5e5e4668e1ce3d0 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_26_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a4e987570aaa427c84d57db99f8a2529370b6bb4bb6e61a5c0a280fc3f8b1f3f +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_27_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_27_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..9c3239fcff67bf15ce49e9e79e187324a699095c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_27_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:addb78803cdfa5fcc25acb716a18e71def373b29af9d89294cc8bdda95b3757d +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_2_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_2_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..7e8b538fb206407dbb80bc34f1b59c33d352e864 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_2_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ce21d6653df4f0be7621f065b58aa61c970bef1c6dbbcbf018391ba742f93e4d +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_3_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_3_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..cb6af6b317ec0581d251e39fc0540030056be26d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_3_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9e17ee9ca84caa8e84a29eae3086e03cdcd8242fa586638054520904b7d5811a +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_4_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_4_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..754f4bcfa097f57f7181dee82f61b5288cb5671e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_4_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ceab42286b21b2bc4a581e3b7a3cb3de527ddf20deea8354d87bef6ec9b8b648 +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_5_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_5_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..58e175f50a43d4c1deaeb2acbfc26921669fa447 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_5_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c10d49188641f77ed026550da10a961d3e7096ed1c71fcdaf8c4a4964dd5d8b2 +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_6_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_6_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..7f071d0d9ee8e42d35e1ac4d622c1d71a4ba7906 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_6_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f9b99c08a5951d88fddc30f73527ab22295e4bdc040febc36a47616428f879fc +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_7_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_7_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..c301c90b7efddcee8db52cbcfc196386ac6c216e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_7_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:65b2a3db54c013bd9849efb2db9c10758012e0f5a5d1f31397e97482756600e6 +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_8_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_8_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..c7ca746be773968cf1efd88a58dbbc7850baa321 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_8_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d9e62e3c64c74eceaaa2bbd8b35484b88271cfe637474693ce93c978f0bed7ad +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_9_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_9_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..6b11a1c96a58d2a4085c34fec35f3ac4e9f30524 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj._extra_state/shard_9_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0069d31c4f93e27495b18776b0c2fe67027ca0e663f5174f085b69c0cd60df36 +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/.zarray b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/.zarray new file mode 100644 index 0000000000000000000000000000000000000000..8ba9182c49d4d24957ad5a22c7220ce81052cece --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/.zarray @@ -0,0 +1,18 @@ +{ + "chunks": [ + 1, + 3072, + 1024 + ], + "compressor": null, + "dtype": "bfloat16", + "fill_value": null, + "filters": null, + "order": "C", + "shape": [ + 28, + 3072, + 4096 + ], + "zarr_format": 2 +} \ No newline at end of file diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/0.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/0.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..f7e250194d94aa5927bf11a23610f320f14e56fb --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/0.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ee045d249f25475364a1513f6352c99b5b208fc6892a05c34668d2fe274d15dc +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/0.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/0.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..5933355ce093470bba1532d8b6fd790eb9dc1904 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/0.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9415a800bb7333eda3f4db77727c6402b192ef4bc62fa87853ee2847ce97eea7 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/0.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/0.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..fe49f4cdd708fd0cf1f89ddab4b929ee4703f153 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/0.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1ab0283d9c2fb310ae92f1b9b8bb37c0a11977fc8d5e63b7d8d2f56eae0a4eca +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/0.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/0.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..da0daae20dfc20bd63c0b2e81ca904fd9fbd917c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/0.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:37fb07417ca7c9092b939fa0067fd108da16338e6987abb9fb98854e807b757b +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/1.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/1.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..635ec431cb90e8616c7f6571b909f555887861fe --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/1.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:956c57902c7733e528e178cee66f0e4c78ebc92fd68a9fe789eae131993c09be +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/1.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/1.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..17a2d154cd60f19ca919ce6f22597c784cb34fba --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/1.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b66ac991b193012a1d8d16cb43b3880898a92aa070e4aefd4a976f613fc2910a +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/1.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/1.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..97feca077ebdeba6e3d63c18f6ce0241e0d61029 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/1.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cf3f523e09b2a42dd243411962c90123f57329f2518f604bc50500c224376b04 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/1.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/1.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..dfd903e91d6625386fd93dbbe02e666c1e9606a8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/1.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cc1ff32f90a37491e9f416b8ee8d3ff1e623188f7ae3252b5a59769b2e37eb48 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/10.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/10.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..7c239ae71798135cf463ba8c4c21b105d0acc5ab --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/10.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:934c8389e140beba3673f614f587911fa43ac906677e7f077453074490a56bb2 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/10.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/10.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..dce017157a09b3b657ae0d218d7df894f08ebf57 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/10.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3f417d4b9246467bd79dae51b1922844c80f296091ef875c5af460c77e22f909 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/10.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/10.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..ffca548055cb8ffecf67f7603320b3c2ca09294a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/10.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:650bbf0894f279deed78900e40749a4d9edc3869c013e9df592ddf5974e14383 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/10.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/10.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..42e4650032e3003c152bedd1aa177139323f92ec --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/10.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:79b3e5158860ea60f1b0c02397bb07e01ab9744d6fd9704eeb55ec64989ffd06 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/11.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/11.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..7e175dd1b559f7f8dfd6a5e1c42fefc56996b053 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/11.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8438167871f574987bf310173e5166b7f4021a7f8c865bb34ee31542e3af44ad +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/11.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/11.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..88e0b00015cb2f30833ab95957c9e3c0f8dffa80 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/11.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ac13fbd97cfa9018594dcb9e22b89586c55b18a7844ee59efbe04fae2376513c +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/11.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/11.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..1708800bd377b1c5efec2fa374501b1db08013df --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/11.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cff1f06b40ca769ffef206ab4653f9ce3a2b4f664573b64d686170ba17459729 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/11.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/11.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..290e4f05d4ef0792d5a33f23fd1d363c9eee3cde --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/11.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b3626a17869153c26cdc0ba8a3641521664aa00488dc7e0b465cb2143066ea8a +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/12.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/12.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..da80050434f4b1d258a4d8ebf3d1aa36fa6b825e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/12.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f790aface1f3d097ffe4a2000391df507e2b27ec2723deb8ef5661b74ed9abe +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/12.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/12.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..1c6827f9b822a6d6c234860cb53b293f92d9ba09 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/12.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:172dcf52ee813905d472a51eab8315895be8268f9c12e71565bf6a7dda57fff8 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/12.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/12.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..2abd52705d716260353e70bc38f84e5222581ddf --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/12.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ed32f8cb665bcb56fe3023116f96ed299f9d9bd27bcbdf7295ab7341086f9910 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/12.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/12.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..0b5e638d8070664ea5e9c52a963bd335def11ed7 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/12.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:66e0aa36f94028e1c6a19a3edd75c702097d4db71ba621131cb3a75039b211f7 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/13.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/13.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..ad9c20a3aa9b20fadabe29b1e88e728e05e56985 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/13.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:729aee5cd7a61e8df2dc5d7dea5193e24d6ba7349e84124b51720a1e3635b153 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/13.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/13.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..26fa1699efb81187e910b67affdb699f92522e20 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/13.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a2ee8f0fb8ebbd1de982862280e00f75d7fa7826714655f70ba89a1aa4e42cf3 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/13.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/13.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..3fd0bee011385e9d0572205c255918e81f952c91 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/13.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1d1f9b7a0687d3bec3d3f0a63b60abd3a30a4b3ea6422592ff11ea8895beadc0 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/13.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/13.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..2b4d77a6cfd3d62c14cf69af8fd5a7e0a453c835 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/13.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b95aea906a85eebff9d83f86797db357cd22c617961afdc90909438e14806f0f +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/14.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/14.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..7afb2e7783ba552ad6d97d3c87d0aae0a91df4f1 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/14.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bf069358e9a0076b4311767d05ec2371c403a682c39116d749f6221243fcc834 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/14.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/14.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..7d9e253dd3a836b0f9cc76b0e5766085f3f94cb6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/14.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:82f3bcd0121528e0941a84162f2802bb5acbe822064f9d832ccd827f5191b23a +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/14.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/14.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..bfd9cb219d196ca1d4cb97d8c0480184b5e9d847 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/14.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6ec72e3c9ac98fd82f7419818fcd5d6ac955fb0acd8c90af3739f009f7bddba3 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/14.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/14.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..2ca1dc134489d7059edd4caf7b8ba0bcb47e6676 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/14.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0b83447cd484ec4fe9b88d5c47f4e78c1b7835c735e388c469b0d4c26936983f +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/15.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/15.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..2d9824aa33331eb89bf062b05c78ca33002f402d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/15.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:00f3dc8aff742c08310ded840551649348da1dc831c173af9e596cbc7dfac2c1 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/15.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/15.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..6577501adbdd588dab4e2252b3bc5e163f48e933 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/15.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2e7458bb483abc7f10a9289f235ecc83df1b9ed4a53f92a8ccef44624d579bc2 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/15.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/15.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..83788a72d4dd13fff5621eab5832c38b5785b7ba --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/15.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bf5d02fe4d501c643ff404b59d5c3d0690d329fe890ee7daa152eafe004f054b +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/15.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/15.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..03fa0676dbfe5245ea21a04351f9060b524a9fce --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/15.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a5bb7abb47469e654564ecbd791947fefa2d60f90ae7be8dfa73cd54bbb8295a +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/16.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/16.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..d2c7a86f0af9d81fcac7ab1248e42c7feec0fdda --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/16.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:de2d79aaf70050c9e6fa8b980b8aca9ae46b27a0d62d45d40180fddfba1c045e +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/16.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/16.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..2b9a4b91add19ef045f11657af3b60651f1984fc --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/16.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1204be7ca8755477b69df0531220a75de25ae7cbcde0622a5cb69f32e5f4ce59 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/16.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/16.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..ccc7eac4f1c50b5485ba2bb9965c9d495f0e328a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/16.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b05cb37357273d35bbfed55c090a8e0788598e5fccefdf3da3e930f7460be03f +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/16.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/16.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..f0e059d1da34adb91f5733116616bc7516431a2c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/16.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b01f447546fb34a9426e35be86bb28ea8ae7569ff66c946eba9312d97d6e3b02 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/17.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/17.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..beb29cd87626645ebe124599c667f55b447e1fc3 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/17.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7b55415c6a3ff373e246ed239223d74b314f4e2f8c6f603dacdfaa80bf438d1a +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/17.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/17.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..b7d92a36154a49fd54f3bf9ebfedefa291f9b9a5 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/17.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:33c320c4dd107d457308df9e72967c7eaf95b70c2d973225d4e180187b34b061 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/17.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/17.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..da6035e1da8d11c25fdbb835f87a48113a48d594 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/17.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ba232c1692fd5b93c0c30fd35620b4a9ddcfefa5549929e201f5ec559eaea049 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/17.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/17.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..c839f9e80dd04efe204328647583507c598d9fde --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/17.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:371bcbcbb3a4e9435c5860a7f999b00a4778221d75d7007d5a94474b0e5847c8 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/18.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/18.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..4d9d26fe32e706756598ee53dbb4a0a6af383e64 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/18.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:88f45fd4855636cb01af77f1b69547c49e4d1e0d791f6f9279cbb2e2647d03e6 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/18.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/18.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..73c77d462a32a87865b01cdb11d16a422252c22d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/18.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:85599cdaf49f8b6f9a32931d781fd1cbabe4039043e897d5632c182c580318cb +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/18.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/18.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..7f1946b14bd1c6ddcc3edb28d2155f86841fe84d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/18.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4a45d2f4ee4c3512ba879a397530ed2fc3e0c2a7b7bbc2471664f4f734ff6564 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/18.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/18.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..8997b8124a93e041393503cb6b6e56a086363faf --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/18.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8171cc64a95e08b6b0e24e9497e199d6f7400addb451515612b8f6d074af625f +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/19.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/19.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..ba7e9625d5c45024015ce95af74e402007ff2eec --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/19.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:061060e27699d89f520fca6c3b2491f2412769137bfae1bdad4b29ce003f016b +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/19.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/19.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..938e917f7ff990b78330859a17dce350f15dc820 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/19.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d2a2d3e23f12ebb67e56abb19d91cd3ff81d0a5d571bc914634e43e3ab01050e +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/19.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/19.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..41d353b34a65bc861b5f22d66e86ad2dc622dd6b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/19.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8723b4aa3fb08c0d45f7abf2fab6d887952dc7c3df02b0976a226292b741976a +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/19.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/19.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..52e12c47a4e1d65c0df0362463090a71f238d4c1 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/19.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2b4351d71016bb23a4658293bd9c791bf1da5772c34a936b010594376f33c611 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/2.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/2.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..1e8a48692257f63ea108601dd7cf0b62af0ea743 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/2.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3624640974406354f2ef83967a337e4ffa1817b4051a7d95da7d989eda7b1058 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/2.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/2.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..6c4c9e7e460cd340d2b27eb2115290b8a62d493d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/2.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:69ef156d85d158ac88add90ef1caa5f2ed0259fd5450c493584b76ce1b8f3217 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/2.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/2.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..caebff4501465121d38353809d5e801902b3a44d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/2.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:39e0d62c806728927652c3db6ba79715d678e98739a7225c25a2b51f69f5c05c +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/2.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/2.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..4930b39d6a7744752609adfd151d9c1012d6fd3d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/2.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:69c07d28477dfdd6d00d2e0eae33c702d154ac6a191adbe497813d4d6713af2a +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/20.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/20.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..4dcd4e3eebdf129d82e239ca33623cb808a31962 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/20.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1cc659d3c50ab2ab8a0372d429a8737d85f232baf3ae9de0857a1707ce90661f +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/20.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/20.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..5c52a0a7df6f743ba6337f09d70bb4dd89815ccb --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/20.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e577cbe103bce675ba45f58e7b936ab3e6d55c43057408a439d7345b8f68ddbb +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/20.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/20.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..c2a5adeb4f1082bf9cd8a212bbc81823efcd7ff7 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/20.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e33030ef455adcbfd9437553cae1895277f2427fd271d52c8be688e50ad1a798 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/20.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/20.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..ff55e5401aef31d72af5280b683f72a8f8a8c3a5 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/20.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c8ddfbb2952d00224202297a56d1a09ca24d86ca06b18cff354331971cadeeb5 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/21.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/21.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..37f7fe1590814efdd07c54325894db3ff109d588 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/21.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:695a675c5410957a232f4a9c3724df0ffb5a1631ee46e8f9d50d0bfd771c9f05 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/21.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/21.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..b4872bab37dbeaebd551f1464ba05932dbe779cd --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/21.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7c28b379868af21aea2bc810c0e5f16d287fc2a87743c1ec7297ec4aca9579b6 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/21.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/21.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..391f73a40878518a3d5016aee093c4f8c64739d8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/21.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:969921f99f7d6faabf99d1879b7fed2691e4c08720a8e9ff1db4e8d174b40be4 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/21.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/21.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..80c8c0684227289cc29699bb756666f5aaf65ae2 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/21.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4338354bd6a2d80622efc27e2365c9871cd437b58ecad9197e715af50f6ed46e +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/22.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/22.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..8b15f46f52201003b1458fc69224fcc458b8ff64 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/22.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ad9330e62a9ab478bebe875496642484ed423543712e1d47e0a53aa246ac366d +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/22.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/22.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..f516b5c53c70bcb8fdcb5731f337dc89be5169ac --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/22.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2a8d48d5cf9e05d28d12041cff1ce5d6ec2572be7fd57790835bc078c3d64fc4 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/22.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/22.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..326b0f75739ac28a511c5337cb61976c7bc0585b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/22.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8893322f220d33e65ae1722d786cf73e60cc98d1783bb03a5c80d8fa32adc7d5 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/22.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/22.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..1961d2995fba09914917a3259846ca140d505254 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/22.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4e8aa3dbeda73928bbfbd85fdcd0ca269a84b4d982e65b33d80d8db13254fc75 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/23.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/23.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..4c1640a60117d1e213eb9a0a3e19ca5304ae04ef --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/23.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e4396e6b904d531c017bca61b2f04b61b1ffdc7935a39ccd285cabaf83f11192 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/23.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/23.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..4875867576378026a06af2a4e04faf8f411e6a20 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/23.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:62402992e5fb321922f94392b316d4f062e2fdeb8de6c2190c374a40da6a7074 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/23.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/23.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..c671559ac85335ffe7c89b188f8a71193ac45cb6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/23.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d774a401b7537fb2a71f23b8708663e9753bf0297eab735b057be09ab028ff58 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/23.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/23.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..69228b3b159851c6dd0897664627d5f48198482c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/23.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d74dec5a0506d185789b7fd5366165e63e35016a0a87954c5708e28fa7013bf1 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/24.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/24.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..77dae03b3afa7cf91b40dd9362812b147f4f386a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/24.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2ff5ebd40189bab2b6175f5ef7cf531c065cf63aaa9e1667a23705d19a992458 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/24.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/24.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..4193fa50c2ab5f833b20bf1fb189cfbf55801ea0 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/24.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:daaa93af9de12196af2555d0a1a885cc0810646b4906bdba8d49dc6411fba1b0 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/24.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/24.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..f3fba5cac2ab6ad7281ae5f164a6b93a3e2553c3 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/24.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6e0e886d3b23bb890d5567430b7c8faf70f6f78a8c570b901a2aeced483bfa73 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/24.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/24.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..99387ab6707efe0c66e3d9787f9384140b21e45d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/24.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4ec46de3bd1f4661ba6e2322af385e3b469db8d52fdc13b680db7e866473a44d +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/25.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/25.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..d45977e07676d6dcdf223a4bced664a7d46f3c4c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/25.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c9cd800f77a8bd50b82b227069279877f822b5a04eef6e0c5a3b93fac39f46fd +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/25.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/25.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..d7004174cb23bc8b4b7bbd178a9625d772bc1e84 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/25.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5ba632be568fac8849cde9b4c7a860f6e5bee3a922d1cd8217e25205b0f2caf1 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/25.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/25.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..41d58398b55603e0bd602d7f23cffa3bceedd9d3 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/25.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:172a14890586a93dabe5b757f5f1adad2895a6ca92af8cd4a5ca26eac391d987 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/25.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/25.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..9f55890a099fbbbe92e74059df3fa9bdd6c113e1 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/25.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a4d1080d3d50543d627cd165abcc325f760b8d1326c9e816086367253cf17655 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/26.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/26.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..14cf2b347995679c88378bbbe43ad8335d19c61b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/26.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d602d88a71dc86f93f9c979e50379fca75b39e461ca335dea442ccbe03c635da +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/26.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/26.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..14b77c50b35d4d442e59f05c13c6768ddd7beb11 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/26.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dc4fc3f819d9bb43cf3aa963967bf71a732dd476fb8ad511a7c00a3da68949ac +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/26.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/26.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..f1649c9d9d56fa838ed7b922faa52032f2a03089 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/26.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:20c3b2f7f8384aca2e1ecdec9ecfe9d4d3ac22baff3a71e66777554343b09b84 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/26.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/26.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..25882d232d2f3d3bc0b6ba3ebc7f93b35d57f0b3 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/26.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c7515ac825869abeeab24f6d58947a4e2fc0e2effbc857dcf304e9399f66a125 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/27.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/27.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..1861a592e3ee159fc9b4c4850d294e573389bc89 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/27.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:de1c4b0cadf0d6b2f55ff18761b689319d4893a2fdfd090fd2915ca565e57d5b +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/27.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/27.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..df8f037b922e1d0a692ab98a53c4362657e93390 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/27.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4187ba3e57671da32dbf6beb53740155c0a153c207192a2a6ce97a6e3848c082 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/27.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/27.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..4b8c76d5abb23e885246594cb93019f5c1ce9e2c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/27.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:22bdec8d854e56b73a69930e8da127cc48f26d7f3f5d06a641777633cd62cd44 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/27.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/27.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..e60da00ddefb0f21538b965265629b356d908308 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/27.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b87bef247e66eedc2e1b38cac464f4f945c42ce2f49a31e04aa376767c4bf445 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/3.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/3.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..105bee26ae59e02e09a47a492bc057d7903b8097 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/3.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:47337f81b45bcdf4c2cacec549083f7c67af1e70fde0a24a969bac471fbf9b5f +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/3.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/3.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..8d3c4009a9b686fce6c7b683fa7aba0c8839c706 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/3.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0eff73404e323a01d9bf33daf59b036775c5de936c689447dd110c53152f33e2 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/3.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/3.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..5edbd09e8bf7014b4a12a284dfb8653262706310 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/3.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ebead6d949b99c0422e0daa704feb31e0ce58c3acc5c8658a47b1c41b93079d3 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/3.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/3.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..aabf4f73bfad13dfbb62853f895c60746e0d47e3 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/3.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3c2b1beaf1f804578a950e7c711f29e76198ba7bd1c1ddd06ed1a91589677647 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/4.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/4.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..f4ba219189a3a4e5e34f60db9cd3e909b9e8063b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/4.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:51132acef98116d283dee07fb3d4c6e74ffe023d34b979936a2f0cb198527ac9 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/4.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/4.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..186f0cee3079b6fdf8df0a969e95b78e45bd1b4c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/4.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5aa1d1e3d1604c84c067c485e80730ca3387d000116999dd14181b647a5a62a8 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/4.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/4.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..9c4e823a845e8367ad7a4e4af0406ed1c5eb8cce --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/4.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c2a33b3f15321916bba987c60857fbf5f9f043e09d9d25a8e14baa638c86963c +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/4.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/4.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..08d23a6d0101baf5ccf2c4bdb6c5d952f95cdc14 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/4.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f0650c4416db0aef2f4d32d8ce2b24a7c29390cd6286174e68d7e66e7a9ee902 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/5.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/5.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..a190fe333058909407f8f4afbf1e08e56e1bc13a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/5.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5a191a361bc0f762e243ab95214c68b68e46eaa4ce64fe6325701373e6303a07 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/5.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/5.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..83aca413142b5940d13f380e70770185d84ac085 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/5.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:99db5a3938196ba8a636bd4b39c94b55d66822720ebffc5c3c25a634e301395d +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/5.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/5.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..41ec3cd3718b3e67c19192a3b26bd91debaed23e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/5.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ca43a3345e889c3b269ca42ce2231469c7b8494ec3c4caed6b2e530f7983e7ec +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/5.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/5.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..34cd53d095697a16a1a169ce36df71a266bafb76 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/5.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8b248369270e59fec8ed82f57b13300852e49e983de8b76ea5cf03ae7f2098da +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/6.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/6.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..fd6adef76ef0f71a24ad279a1d681319a7f6fe47 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/6.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:983c6205df98d60c6474edf8181342a0006f86dcaf60111cee1d7f46707e0d23 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/6.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/6.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..5fed8532f5062063d9116e7688dfeaf7e1ab722a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/6.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:91a1a1047ada31c216e3b1ff3ee8ad3df8277a2287df8e1f58e11c5053e5dd5b +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/6.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/6.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..418985a73f6d470767ac1747aefbd117e21fefc5 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/6.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0086e0985db49f18b8d4ab9fbda360c617e4c9046eee47fa6a67828b6f0979ef +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/6.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/6.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..bc2009700c5957afd36449503235092efd654805 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/6.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d3ca9643ea9313852d63890d27e700601e56666364c9f98ce2beacb33645081d +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/7.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/7.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..5dba9a0e857b32ffaf0c50037662b20163d12cad --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/7.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ef861acfa0649a447e8df6620127c54cffc7d06be8d7005d6cec8189a983876c +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/7.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/7.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..63f2843a78ef114112f22dae9bfb797e39274b60 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/7.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b8356c569cd5534eefa24aa7db49b3b80d34d5c23f33aedab00c5a5207486ff7 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/7.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/7.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..5fd733bb7f55baaf250e04c36f72b298793b74cb --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/7.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f4064b402df9179881ce60c568229ba4f3e005e2c421117507b64e82e039da7d +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/7.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/7.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..d09aad7e3642fa458927237fb8501b8e9d753fea --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/7.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e91a6f842886c51437314fe62f58ab6aa75a7c41bccdc91e6018e90655395f4e +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/8.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/8.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..73da0207ea369d6379692ea158c4dc40a230b71f --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/8.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1e3ac0e351f01ddab940987dd61673622a20ed5621aecaddecacc2feb1236c13 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/8.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/8.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..0bb88a1d046b84882b7c063f639e7a670079ebb2 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/8.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4f1c01112754e06a1a0c59229eecb5bd863fda34903597c539e6168b8e9f82e0 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/8.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/8.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..6ddffa5ad7234c051ac2e3db6d267c608d412646 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/8.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c90aabdf8c6f9caec65630ebe8d6eef525db2ec773a05a35dcfd8fcdf7819e33 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/8.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/8.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..61a934ecb1298208f1825fafa5d4099d22d73435 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/8.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:885c71b324ecc0136ebbcd1e1abdf45064f84176ff7b0acd1b68b46fc840f50f +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/9.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/9.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..0aea80e07f4144adf4638b6f04734842820d90e1 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/9.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fa7fe0fce4e05084ec90e3a0ac29d77d210bfb51c2fe988257ee35da1351f795 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/9.0.1 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/9.0.1 new file mode 100644 index 0000000000000000000000000000000000000000..9c16fc5f3f688084c8f987438cf804dc71acec1d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/9.0.1 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ac261b2e6e2d932eacbf777c982e5516429591300a53bead5171e4dce065d542 +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/9.0.2 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/9.0.2 new file mode 100644 index 0000000000000000000000000000000000000000..9a0b43896097fd4225c169e18908d299c9d7d9e1 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/9.0.2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6b20dcb320d9b11d164a17688e94d0afffe08934370b293f88e2c2a7b502d98c +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/9.0.3 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/9.0.3 new file mode 100644 index 0000000000000000000000000000000000000000..23061db9e99133e15cee4fe09243d52731b3685d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_proj.weight/9.0.3 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5c39ba2d654703f898ad5c721a20926caf689790682c7609e2451f4f9b87375c +size 6291456 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_0_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_0_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..bdfd5deaa7cef9ec35e981f5fde6534cb556554c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_0_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:63f63a650a1e81efb1222bec5ea786efd0bfb9a5e80530f442bf91f9acdbf8df +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_10_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_10_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..a440038d68af2458a2032eeaf2e1fa79cf5c8333 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_10_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:227c205836c7bbcc512b59dc9008d0c412699bd03df9aebfee0b7bd3c3e329c4 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_11_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_11_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..609ac2f1bdff7c62b652fb7f1d0281a8e154bb08 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_11_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b05ac4f60393e0c09711fc249a87bc2e036047e8dcc5d63402503e7d1d662c31 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_12_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_12_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..8417bad1d79124c6d415b688488f813e9aa59250 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_12_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:764ad22a31d074c0951a6a5dbd73f033ca459ecbb5ac362236981e8fe12da56c +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_13_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_13_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..e01a6610c2cabfee5ab8f23b7be40f83fa72adfe --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_13_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e620f1086ba7f104be16365fc69e8487a32d8bd7acbdf63f87bf802447b6466d +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_14_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_14_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..d7dc479dce1cf4d854fed381732cddbded4919da --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_14_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6910bdfa718b3600854184a6b1f32a8e9d9be3ce10c17f12fe6db79120786a2f +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_15_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_15_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..9dc5d5187f1ec2891ca599e47c932ee715c6842d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_15_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9acd9fb25e6ab8a85b6ab125c958ee5480710dab44592b1fb84a4eb69872a013 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_16_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_16_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..9b41c96bd9ae3456f318bd68a6b349bcac76020d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_16_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:885ef049145a29bcdab6fefc355a8e1236bd882c226930cf530a3ba0bd4ee721 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_17_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_17_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..4aa15a759da172cd1b03640bc9277aa61350bfaf --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_17_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:22bfd3234ccd2293b4de4762d491d064b46cb4558254220f7634545715025838 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_18_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_18_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..2881e7895900a5d72eb8a991ee08c58f20f253c8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_18_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5c02c0faf91090dbe1822a700e9bc7f91f16e835f7f826eab91a0443d48cac46 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_19_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_19_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..49e7902ecb6a4b2823a4714d278459095b09acd6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_19_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cf409a5924123568716ca4dcbcd3b5f1ba9624a1713a170c91d025cb035207cc +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_1_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_1_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..b0775267108891887248012335887ef50e474889 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_1_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5e8b079d0103f30cd750bc238764389ba0c37424878264066e76bfdbe45c0562 +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_20_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_20_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..aed8bf7a92522b09c2c6d802baf18c6420c7ab0e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_20_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2188e59e72780c58a3acd8e6ff7fe03f373d14a42a0e36c69f1583b349cbc4a7 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_21_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_21_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..54c8e397876158258c7e3d55678ec2b5b76b9ca9 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_21_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a5a40ef6b59bc7e2ac745df57f28cc08ba614e13e73c12c7a16ac9e3bb005a74 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_22_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_22_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..c74304cc0514ddcbdfda7fc598edaee49f7ca20c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_22_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eb827b13ea508701f2eb587f7c075983c97a7aedac472cc63f237756c5961c76 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_23_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_23_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..5496445b8f4849c98c75f3e0dc375b647d7e3c04 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_23_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9a1927306a894bb424ec9e0ce4a672d7afb28d3bb9d5db1c6e1268c2ec58232a +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_24_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_24_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..5c68fd9e24c7965a63ad4bbcfa04638f17d6c974 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_24_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:84ab4816acfe4bf4df6814b170665c0b14ff82e64c6a3db350780a5ee58961e1 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_25_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_25_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..a1dab169defb9b542680af169499ec57bbc631cf --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_25_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e0bc28d64fcfc1620f51ba7912eb5f3e6757292e45e4256c2d11914214bbc77 +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_26_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_26_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..97bd93532dc4ef87c88fdfb3f5e5e4668e1ce3d0 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_26_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a4e987570aaa427c84d57db99f8a2529370b6bb4bb6e61a5c0a280fc3f8b1f3f +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_27_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_27_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..9c3239fcff67bf15ce49e9e79e187324a699095c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_27_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:addb78803cdfa5fcc25acb716a18e71def373b29af9d89294cc8bdda95b3757d +size 1840 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_2_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_2_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..7e8b538fb206407dbb80bc34f1b59c33d352e864 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_2_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ce21d6653df4f0be7621f065b58aa61c970bef1c6dbbcbf018391ba742f93e4d +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_3_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_3_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..cb6af6b317ec0581d251e39fc0540030056be26d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_3_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9e17ee9ca84caa8e84a29eae3086e03cdcd8242fa586638054520904b7d5811a +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_4_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_4_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..754f4bcfa097f57f7181dee82f61b5288cb5671e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_4_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ceab42286b21b2bc4a581e3b7a3cb3de527ddf20deea8354d87bef6ec9b8b648 +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_5_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_5_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..58e175f50a43d4c1deaeb2acbfc26921669fa447 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_5_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c10d49188641f77ed026550da10a961d3e7096ed1c71fcdaf8c4a4964dd5d8b2 +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_6_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_6_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..7f071d0d9ee8e42d35e1ac4d622c1d71a4ba7906 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_6_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f9b99c08a5951d88fddc30f73527ab22295e4bdc040febc36a47616428f879fc +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_7_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_7_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..c301c90b7efddcee8db52cbcfc196386ac6c216e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_7_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:65b2a3db54c013bd9849efb2db9c10758012e0f5a5d1f31397e97482756600e6 +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_8_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_8_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..c7ca746be773968cf1efd88a58dbbc7850baa321 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_8_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d9e62e3c64c74eceaaa2bbd8b35484b88271cfe637474693ce93c978f0bed7ad +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_9_28.pt b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_9_28.pt new file mode 100644 index 0000000000000000000000000000000000000000..6b11a1c96a58d2a4085c34fec35f3ac4e9f30524 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv._extra_state/shard_9_28.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0069d31c4f93e27495b18776b0c2fe67027ca0e663f5174f085b69c0cd60df36 +size 1836 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/.zarray b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/.zarray new file mode 100644 index 0000000000000000000000000000000000000000..5b98056452be6adf83cd241da1380f6b4effa63b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/.zarray @@ -0,0 +1,16 @@ +{ + "chunks": [ + 1, + 3072 + ], + "compressor": null, + "dtype": "bfloat16", + "fill_value": null, + "filters": null, + "order": "C", + "shape": [ + 28, + 3072 + ], + "zarr_format": 2 +} \ No newline at end of file diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/0.0 new file mode 100644 index 0000000000000000000000000000000000000000..9684076f2c581071ba4d714155c2876568c66f42 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/0.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/1.0 new file mode 100644 index 0000000000000000000000000000000000000000..c35fc428541d7c237c1542b8ea2c237d397bf4a5 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/1.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/10.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/10.0 new file mode 100644 index 0000000000000000000000000000000000000000..03178d4291a7b99c39932ec416681db73bfdf40f Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/10.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/11.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/11.0 new file mode 100644 index 0000000000000000000000000000000000000000..4200e9a5730406d37e4c3d50acbab70b64031012 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/11.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/12.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/12.0 new file mode 100644 index 0000000000000000000000000000000000000000..d7c920f2394f6f8741cb8bde496e34575940a0cb Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/12.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/13.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/13.0 new file mode 100644 index 0000000000000000000000000000000000000000..7e877356cb0503ffac645718a71a0b42fb450dc2 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/13.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/14.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/14.0 new file mode 100644 index 0000000000000000000000000000000000000000..a857131e03e628b9b4c347b4eb9b3a536143c71c Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/14.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/15.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/15.0 new file mode 100644 index 0000000000000000000000000000000000000000..054da3ad26496e4f7977dd7af07fe80875470c4e Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/15.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/16.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/16.0 new file mode 100644 index 0000000000000000000000000000000000000000..8ac49dc80bd970f896e0dc9596838862181cde25 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/16.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/17.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/17.0 new file mode 100644 index 0000000000000000000000000000000000000000..fb5c6f8620d12b0589ddb36aa8114a32f3b41703 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/17.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/18.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/18.0 new file mode 100644 index 0000000000000000000000000000000000000000..9556a9936cce2127a75de97f4971d700176dc9a6 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/18.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/19.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/19.0 new file mode 100644 index 0000000000000000000000000000000000000000..2abfc9428f926307bbd8a6898ee1e73abcd92e7a Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/19.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/2.0 new file mode 100644 index 0000000000000000000000000000000000000000..af130d21b1cfbba04fe271a67af8bcc8951a0b14 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/2.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/20.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/20.0 new file mode 100644 index 0000000000000000000000000000000000000000..038adedcbab13f892f273966a6b94de2f4b3a1ec Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/20.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/21.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/21.0 new file mode 100644 index 0000000000000000000000000000000000000000..21fdd7c588776595a38125fd3dc44fd56f67fc6a Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/21.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/22.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/22.0 new file mode 100644 index 0000000000000000000000000000000000000000..0817768b3dd92b54ab2cc29e31406444ed1775d6 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/22.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/23.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/23.0 new file mode 100644 index 0000000000000000000000000000000000000000..a13e994cb3003eb58de9207a12f968649303220c Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/23.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/24.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/24.0 new file mode 100644 index 0000000000000000000000000000000000000000..1fea97f3b9c67c42a233256e5f5a1376fef3ea8d Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/24.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/25.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/25.0 new file mode 100644 index 0000000000000000000000000000000000000000..c8ddffd41f7897b487ea1032e53d62819c44790b Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/25.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/26.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/26.0 new file mode 100644 index 0000000000000000000000000000000000000000..00082a0e9c1b5ab9dc8ae5f4b1b29970585c8da1 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/26.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/27.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/27.0 new file mode 100644 index 0000000000000000000000000000000000000000..9c6d2eb39391c51212cf352d1f973b3d52a146de Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/27.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/3.0 new file mode 100644 index 0000000000000000000000000000000000000000..016b4157c98a261d0db8dbbd1961fc3df7f5a3c2 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/3.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/4.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/4.0 new file mode 100644 index 0000000000000000000000000000000000000000..4fd103d4c0fcee44955fb381b17377cfeaf3b519 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/4.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/5.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/5.0 new file mode 100644 index 0000000000000000000000000000000000000000..e6085518e582c6a5dd4367cbdc607713f7022d67 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/5.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/6.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/6.0 new file mode 100644 index 0000000000000000000000000000000000000000..92711f31a6099a078a4c6188695076fa04ba2b90 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/6.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/7.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/7.0 new file mode 100644 index 0000000000000000000000000000000000000000..2249b6512143c5746c5c6644e7c5baf570021017 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/7.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/8.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/8.0 new file mode 100644 index 0000000000000000000000000000000000000000..993cefcfca277f916779f9880c8017283dce9f58 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/8.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/9.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/9.0 new file mode 100644 index 0000000000000000000000000000000000000000..ce55e0f828baf356b2323649550cf252e90d64b4 Binary files /dev/null and b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.layer_norm_weight/9.0 differ diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/.zarray b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/.zarray new file mode 100644 index 0000000000000000000000000000000000000000..21a5d6d1208316c2c3c505627484bfaa44255fdd --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/.zarray @@ -0,0 +1,18 @@ +{ + "chunks": [ + 1, + 3072, + 3072 + ], + "compressor": null, + "dtype": "bfloat16", + "fill_value": null, + "filters": null, + "order": "C", + "shape": [ + 28, + 12288, + 3072 + ], + "zarr_format": 2 +} \ No newline at end of file diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/0.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/0.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..792e843a5fd8cc705dcc55cffc4b2488e77fec56 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/0.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:43c7922adddd2a6729d27179cb7237054053d0cdd22c9307916fcbd00e7f81c0 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/0.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/0.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..fbee0d703832319dfe5d198f041f7fda1c06e1a8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/0.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dad46105bd464c3d4d17d93bf5d73d4af0842d19785153a98194a69f5faeecd3 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/0.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/0.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..89afb9cc3c4bdbafcc191c3b1c703790f815667c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/0.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:234323946443cd85f246706722ff872642cf409097bc794a6967994581063b4d +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/0.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/0.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..04e79a80e16cb33e8db6b5dec5122feece646708 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/0.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:22290040e5582a4f118f74a718b3f14ca7c54d0e4592a7c8d3f0a39a3d6e1266 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/1.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/1.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..a7c4d99253cdb15946fb08a4deb4fea03f26e565 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/1.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ada3110e8dc454fce4e00f3b7c6d33110bcee9c70640dcf26992861daead9484 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/1.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/1.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..edbf7f0ec4913835f5a440e25ea9431a1c6fbff8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/1.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2e5de0359253dc8b80b447229e80b9fbdd8bd5b511a546ec1092dacb85776029 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/1.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/1.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..7750d3d646c80365fe3c76fe0cd758a0662c755d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/1.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:336f77cd52ff2317a77df3a89b1230c30bb91c6228556d8720699a6b5829af8d +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/1.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/1.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..c299cc67193ff79c08efe4d31f20fb1a974a4e54 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/1.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2c822cc7afb36c11cfdf69132c94ece02e741b11fa5f4d4b9a4c11290f434fd9 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/10.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/10.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..8e9df30e90c7039899b290d2406d274606907883 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/10.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ab9be22a49fa528e95aa832e2d2ba5af8bf7a800693af28c1d434eb1ccfac823 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/10.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/10.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..26228c63d37667af0ec7d61c6e40147be3539919 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/10.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:23082c289b320e06c26f98ec80e86ff294524d3c9340b10a469f9ca2e87fd4a6 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/10.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/10.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..6b5c578647116bffd4b15ac092c9eab50d7aeffc --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/10.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:11488cf9058f73397a73971efbe2da2dcdb2f3cfbcab32d8d7bf1d567105bcd3 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/10.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/10.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..0957f9cf8c9156095d50ecad4d6d86f13c8c5abe --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/10.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8db42b35dc28a33c0527002a5f8f2cef5d1ee4a677990043221a8c9f92991685 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/11.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/11.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..bec5e63aad711e5b150f7bc5c8a872d3f329f90f --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/11.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:765b4333a66bf9f13407f93cb02ec9256ba516fcb147bfcffc1d92bf745e0314 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/11.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/11.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..7df4d228329112e0e1c5805742409208bc7a7db5 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/11.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e0a42c6811be7ba9bd2e6daed3f65db4ad6a85a5aba6bac7cecdbe9066531b5 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/11.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/11.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..401326afeb05a77868822d00fd9f3a3386ecf259 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/11.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b37e1e68370e1f684dce879f2176fc1c2366afa918e70c67122c7024749b647c +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/11.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/11.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..c8d517c9e8ed7f29d3fd12a32359ac565ba6c0f8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/11.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2647c76c5e95e36a1ac4c9444d156c27cb5b1d984a783c78447dd85b76a54f71 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/12.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/12.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..df6aacf7f3e0e8f0a6bc8a50ff7c6be5dbc3214d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/12.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d3ae8e57c65a99491a6bb97bda3ecaa415f4b6851a3489ff5e62c7de4e0c5944 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/12.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/12.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..c97ab6475de9d83ab30018a63973a910f80316fc --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/12.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f265950c10992170ec27847380ebab30eaabff05bd053a8c82d380eaf1dd5d19 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/12.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/12.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..2d2f4c4467bd1693ca5b87a4ae80ebddebd23c9f --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/12.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c2e9c33986a82bee88119ba6f490e780d512c82320ff5c5c2c89bd495c0a7bf3 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/12.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/12.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..4b55bd3d21dc5723fbf31a301e87e1e235c0de89 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/12.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:74095793b589ee725cdb2d67d55845c1eb1e799290a81f28aa7d148ba73347cf +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/13.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/13.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..2b237f674ac9614139d6893f5b7c26f736e57c79 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/13.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a0b45e139ab0563b83b50b4f90ccf0f3c2fc2e47b6f6cfd07fcc7e8cdedb5b72 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/13.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/13.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..48fa06963313b032cec9252f0ea95d091b33b05f --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/13.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0015898255dcab87a85d9ab9c1f110abd1e7ffce623c8f400e7623e54e5350e7 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/13.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/13.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..e57e0c0caca77dca76261b508b78d1eaf6308e1b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/13.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:acaac0307b7b0314785626fedda3644df5be08c0570622e8db38cec90d6249f8 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/13.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/13.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..912388fba089ee686aa7a981b2057b402f163d99 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/13.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:23855e5dce51af53476a7d8ddb810d33e0944a6be43d3743504dc2108be0358a +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/14.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/14.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..6a4e266d8925239241bc6fe1f399277cb13c9033 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/14.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bd0d633f98ad3c564e67520b23a1a64ec217262d9adb3a325aa1e0f0f8f2f898 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/14.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/14.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..a63e77971c67c5a66c4c9ab240dc787c799802d6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/14.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eac72fedafa07c8b8e9d689818e7580167ebe83e79deb087a5dca880cac45c59 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/14.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/14.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..84119ef969cd2c8ceb652e53cd50533f23f90ec5 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/14.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c048514d93ee952a58e5c47d2a3d026b75c7fe159c30284e0acbef45b06991c1 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/14.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/14.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..0da5cd08741a67e9d3a394698dacdaa86771ba64 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/14.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e5a4f64d0f7e05f05af135c32fac30cb5a6ce48aff3cd1cf94c25eabab17f1b8 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/15.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/15.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..5e1509735386778cc0b7d1a4f1779a9bbc6ebbf0 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/15.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:35e6ff6405052e035a371f65b36b64c95a5ff27f0a93138a832877ba5017ffb1 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/15.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/15.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..8753535c339be477a35f6cd154db33b4e9c515d7 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/15.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8c62b02294a439951ade399e2743c844552464cc8c42c96b0b370b29e54dee44 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/15.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/15.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..b3e0f54cc1cb78a3c3047a5365a33a125e353de4 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/15.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2ea0a404b55a30f797283edaa057ba5f6f3a3b7239bb70e30b17c6fd36eaf028 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/15.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/15.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..7ac8f3196177c3790064c11e75bef4ed61043f25 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/15.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c3e9226b36e6afb5653b1c0a5560ce8629c75bcfa650d418e65afda3fde00986 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/16.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/16.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..e58abce1bad17b800f29500c2e4d6a3f7a252d83 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/16.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:096696b2a4f45ce2436ed9c61e705af62f4d65df317cb9b63d7739e0a095d001 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/16.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/16.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..b474e161d37163f242b5dcf12e7211d379794500 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/16.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:37ba61ceb7bdbe55e83a24e9f4875c03c5d2c15600685abff54e41eea2c2b4bf +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/16.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/16.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..c7150ff30b76cb5fb1a58c9a6aad02201d0fef0c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/16.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5a7988fc45857543788a15d983ed1b23bedbf3ba057b5503eab06c9d91bff7e3 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/16.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/16.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..e8b1aaeb02b6a849c42a1af94f7e84d85d4583df --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/16.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e90a6a68f76b45fdde8809cc6ae268ea767f3422c212661e8912a4b802d65644 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/17.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/17.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..ff1aa4ffb92616868a6357959b8e8a7124dabdf9 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/17.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:635730209343fd7ecdbeebe221adfd278f2f548f8946a9bda5f059db75c35755 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/17.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/17.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..8e74d655b18e0aabe133921cb57c7516ad8cafba --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/17.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c021fc78c3d6007136dcd545a8b1b68a8f48deb0b0369a1563b7810b769be7a0 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/17.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/17.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..67e3617442be2b1bd31d35d0d35fc1a923f2e332 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/17.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:41f3fa158bb8711cf750017ab70622c78da4b30fd1960c2b21a1d1355b3ea442 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/17.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/17.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..895a48a4a3f030c257a82d791d40bc596f86407a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/17.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d80452bec3e4cc53e54366fabcf409096f12b54c12f788305586d78ca0b6107a +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/18.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/18.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..94feb51789885234a972eef67d03391a72897b3b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/18.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f0991cabad25e6e5110836c4525b5b62e44de870324d5353359af27f3f986169 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/18.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/18.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..7987e6c8605d157923e1a4c47662c941cac200a8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/18.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5e277919d36aa277fce27f002c03fd2c08a389bac52850342c1f431b966fe320 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/18.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/18.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..837c7727aa7cf3447a4a324528c879cb605227cd --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/18.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8f470b43952e09a92baa811feb98b8dbdb31bba88bc525caaa32812cb2082e30 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/18.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/18.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..5a1ae3b390053be31f99d7261529ca1c424402b9 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/18.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eb93da8223e7ef1a23358b063d1dabc1c20f18f5f70de415399257fee935b657 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/19.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/19.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..73d6609e1b19a17ecff6b820d32fca0563d60a86 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/19.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0ef67ffb040ea4cf576dc8c823be59b820159d99f80f94a3b199528b8a423a12 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/19.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/19.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..17d3458868345dd20174dbf1331a50a02c205ebe --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/19.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2cd844a4a6ebd3d86a0290312875dc4d0ffcca50764563ac5bea38e7f34a1544 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/19.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/19.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..d22e89e09cca818a72c4618d63a9721d8a038841 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/19.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7911ef23bf42a55592c38e9264040df29266e3ab732e01dc8f56a9564841fb60 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/19.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/19.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..2c4d135b43719042f98293ac9fb9d87821cde6ad --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/19.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:aa554ba34a902a2b21825bc6d2cd97ee4036a8dbecf0e75f7eb393482757ed17 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/2.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/2.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..32a20021cfccaad41e505d595929e8642525dac7 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/2.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8d8b568a6d135f90cb0c9ee71213f803871a78a05a332a32577352362949a23f +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/2.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/2.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..13b4bfe49a4890c4f5798f8424681e950a2ce323 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/2.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:09bf297f2693becac4309ad6150780f603b2faeabbb88f670de57f07e02ac463 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/2.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/2.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..db762de44237a645f76fcbc1ec7950169e0ee6ae --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/2.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:84342b44d47715263e1a279e70a19709490893b885218744b80d4f6d33cffc9a +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/2.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/2.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..dc317dd462240d6d5a5ca2abea5c55959bc96129 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/2.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6404c2f43962df1a129f7749f2d3ebe11ef4d347029afd9d214f41d8ad1c99ac +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/20.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/20.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..d7c8c1f5a8be9efcc860a94110d4520c2409b640 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/20.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8eb3367fac4b83f8e30984dbe20bbc736f1c41218349ae0e8efb957ca7a0a818 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/20.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/20.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..567414d133ef177c5e33cf355dcf1b2d62ed596e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/20.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:011a7fe37abbc55649edbab93f47fffd13f3ca376201fb3309cca06c8c8b7469 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/20.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/20.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..8f8fcaf07e5f02ee04201f81b922b6c48fc5989f --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/20.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:869861ca1d0ddb213aec1bd1c1d8a149fbd91e0551b44cbcc79f876c4868f048 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/20.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/20.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..776c78db97fb158e6ce312152102e0817e0d6f90 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/20.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7bfe8c457c99a263bdfd79e3d6489d1e25a0fdbd8db937cd68d02a0dcd816c45 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/21.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/21.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..67e6988709c60d593d82638ab0b081fd95c0b03e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/21.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f5ab2ab00752c288921f23cbd8c372dd7e0a5f0f03cabac7152dbfee6aff6a52 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/21.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/21.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..c9c35ea57576bb55b69678b2d36ef5aba03160f2 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/21.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7716cb23c2dc4e1e0d3d42b6fbfeed15b77696371899029116c728c715370e1c +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/21.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/21.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..7eaf08c44abbd658163ae69789d47a0d28eae51b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/21.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d2d9b9bf4b50e54190d44fd064540f67cc2a862781362f9dd6d1d72d5d75de10 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/21.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/21.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..255c482ef2220f125a64afe8b2dc55a671f2e2aa --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/21.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:25166d94c46566c11f58e69eaf447857d21cd91233e7710eb728e12a754d790c +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/22.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/22.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..6cf7386c92195e72fcee62cf90ce69a2d172f835 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/22.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0a56d7d0ff7e4eda8ffbead7f8d811030750ee0a9390abb1d8c81b08fa60ea80 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/22.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/22.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..f3df617cbd5d087ad000dee3db7c38cb5976a76e --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/22.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:216e3a4b102a55a6ccc2a2907cf237f2061360aa2968460b564130545473dbc7 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/22.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/22.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..3af87af2ef6284db2403052762dd33de2a5de819 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/22.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c653c62e751f36cfb66443a5cc62f473310faf9b9bee9784e8f0cf33f2d91e22 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/22.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/22.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..8c0840fc4670371a6626915c4229466c49c8eea2 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/22.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:84e33dde310e6ef1d99b1d7a6f16ff1eeb331dfd5070bb0c2715e43be943d13f +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/23.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/23.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..62a9baa8cf369d4d114a6ba911c50572aa146801 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/23.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b6b57053cfbaa800e6657719990ac6054c8aba3d959c53d454ed4b1edae87091 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/23.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/23.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..a2b8cfb20a199ee20abe1fc1d5b2d9a0f1977152 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/23.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b462e7778b96b171eb32224c78413073c68c61d0f76558012204a2e3247915bd +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/23.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/23.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..3f65367a180b9784ebd6b1313aeb205eeec6ed33 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/23.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6cbdce057f8e97fa411bb9f8647cc477e687223fb162cbc94a846a06cdbd9c8b +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/23.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/23.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..6ae2013bd392435e0ecdee3fa9d8ebe4bd983dc8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/23.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:165e77daed3c7585cd526203eabc6c687b9e6fc0e34f26ef25007f320504f569 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/24.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/24.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..94c113925938b7f6448aa78d99fb27a374493739 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/24.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0416eca097e174f7d11c6622116aae19f99aec0b971944957f3dd380758369dd +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/24.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/24.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..0fd08bf285f796ee15e493ca17bea49a474d92d7 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/24.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:95907cc5e61e43c18d7c912e22935f19bd991fc88e74cc6f6021c217ec30bbab +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/24.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/24.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..4d0c1b8ab4fd08625f0572cc441c38ddc7b486bb --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/24.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:be8157bc173d2ac604482ff4019f215fb47da3892bedcfaf1cb9879e7ad0da16 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/24.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/24.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..cc85365ad8781880c8b0bf180d3fd1a7f94d781c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/24.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bc772ae6394199cca4a363ada221c29fc17de29b988c53527b7a72c6add06e2a +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/25.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/25.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..4a43e6822093d3fe441a13bceb1f75fd1df7f41d --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/25.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:61eb29f2f670d89006845b0fc6308f23b7ae751bfb9a1927af9cbfb2164efce5 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/25.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/25.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..79267d9e0f4b4d9dc7ce055c7b09624d8f53dbe4 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/25.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:344b8af9044839611b8830aecc37894714f4470a1754b8d5573c203aab2822af +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/25.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/25.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..ce4431b671974538d181502700ed898862d2932c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/25.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6f1268e14bf755b1ba8d973c88de60dabe3db957d59ea494585b22e70f3cefdc +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/25.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/25.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..8ac0f68b949d1cb0accdbc6fef270a2508924c91 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/25.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c4914b431342cfe4dbff969708e3bb4e314c68b35457fdc81b3aaa67eef6ad50 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/26.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/26.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..331377444479c492b6dc40c3e24c2e92b03ca8f4 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/26.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ed85a404965a10fa447fb1d965599b38f44c28cff2aa10529ee0b1edafdfc180 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/26.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/26.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..9dd1761388fec712c47d68ab55859b520ff90da0 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/26.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:603d9b54154f74ac960f3493e3b5da18291f3e5a24dc04bc90080d7de11e0719 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/26.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/26.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..2db3c8af937945117919e6aa0bd3a097fae0cc67 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/26.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ff1da24bc9d9182b4401c5180d15cc3d3ecd82a61fdb228dd8ed0f576f0dfa8e +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/26.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/26.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..345a71e90849fdfcd44fc48d2efd37c79724f864 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/26.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e2155926581602e3b9033e5019847fcf0a13ef359cc25b3bad3b3b6e30433c6b +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/27.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/27.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..45a2478d88bc9fc6e5b27d397ad288f40a17a71a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/27.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b4c147d247da86631f01a8abbae338d803e746b3f20ca34e6b01a35375eebb78 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/27.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/27.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..15dc6295ec9ab2a178b2867e25a5ccc83b3c90d6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/27.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:74323bccf4f66c2516ec68b592d8b4a454729945067b2e607fd36f4da9f75a29 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/27.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/27.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..24ddc4603857d71677c2157cc9a1f33af1359e67 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/27.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d733e983333c3a921ab675ec0d8279d52398bf14741e064e22a19743547a7a0a +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/27.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/27.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..ac95faa2d316d11ace9a1203b07ad73993c5f234 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/27.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a7abd6e20cdf7c5978150d13de64a437a328f5416aff5a228adc48889a85fa40 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/3.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/3.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..0cfecbe9f54fab095e6daa2a2c40fcdc8fb4e379 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/3.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:486dd8f8fd171bff182092adc3c6f7459622bc9578af10e2b2b17fa88f3f89f6 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/3.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/3.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..9677e3bac28f50ca835c21cda0f9f4ce40abe7a6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/3.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:03bd7b66ecfc7dfa7da0d7dd2191e6c3ffb0fb23c078e3c92ac0751440c5735b +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/3.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/3.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..9de1c8b6c9b91a16473161a6844f55e9f15d97ab --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/3.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ae4eebbcb2ddbc5279e84c9b332bff8a92ff855eb94c40ff7e77db5991abe0ef +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/3.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/3.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..21c0892763ac792c85b154dff8172bec3896a110 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/3.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7923e8925f39c7daa1de848ddb4880b432b77357eab4ca88e68ba9e873c5334d +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/4.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/4.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..16da6db1ce569b85c375763472cf9f0eed3472cc --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/4.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7e22070a544169498650eb1d99d063e68cad7ba776c0ff4502715292112d4b90 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/4.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/4.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..0a8eb58db83f18f79a243a4a3c5f427ec4b5761a --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/4.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:38ef18fbdf12f5fb45ed073aae9863be3a58bfd9786fdb1b0884fefdd5804aed +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/4.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/4.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..f28376d1355772bc426f99ccf03054a64c86ec0c --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/4.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0f92b9dc34d890e9fba30dd070d3331b872844359ea15d2ec6c6d6564699fd4b +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/4.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/4.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..4473568998129c774887851954951b8a43d7cce8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/4.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fe6546efc4cb223084ffc772c7be04321ef195e4c811b3025ed0815d924ff46e +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/5.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/5.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..876dbed01e0a5c23d699740e651eb064e1d47ed3 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/5.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4127a23ac0600b20e874dc124d410ff498e7f12a88b70bce41a986f09700a038 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/5.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/5.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..7b3fa167e2447d878ef7b3f9c162ad8059725f7b --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/5.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ff7d16b165d491686e42eb5af5ef8c587e4776ba08a207fd6ac4b67dd38cfa34 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/5.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/5.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..b63fc3ef186630532c97d5b1dce173080c0c38a6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/5.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ed3300749247e597ec172842d58b75de6171d57d10b66340b561f27dafc66978 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/5.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/5.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..33ce4c847655ba353715b2d7831b5ea3b8b2cefa --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/5.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9c99ade95694c756c89558603d0e5ad83ce10dcbc84cb9d73852026ce081cfa1 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/6.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/6.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..100ba251890304239701df7db5f274f85fce8ef1 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/6.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5327c356d2d9dec55b7208d7404851938877953c5d1b24bda7a67ae88d2d4583 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/6.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/6.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..90aca9627f2182764ea263a84567e6cb2e5efffb --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/6.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e04e7422c8b0d6611893b3143c5694b1d8bf00fcc0637226418f08bad0723ce6 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/6.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/6.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..89284ffc81f20402fdfc65056129dda6f1c94ad6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/6.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e2f3d163a9668e35534a4e139413577049e8dfbb8fcff0ab0bd5dbe418b1816c +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/6.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/6.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..faf8975a597132e2ccb63c513dd9fe1e9589c7e0 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/6.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:54e315d21f68239e588370285a79330beed348aab67adfb63336ed7d0721c45e +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/7.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/7.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..71eb2aaf0c37366c899ef58b7dae1ca360768f83 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/7.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a20a17eb54d284b9cc44a3aa547003c4d9049ad393de8bb6d965b82771094e49 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/7.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/7.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..8efc304257c13c7021cdd9778d6a6116a9eeba32 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/7.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1bf8b19e07196b5508c6851886ce4cf6c40e8e43dad6839fade978451ef8b139 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/7.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/7.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..a590fb8732d898d38623fe8e609bb6aa7cf15f21 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/7.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6da1402a48de5dc11ebbbcc6b6f11e132953f2e61d69cd12d944dd6723863021 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/7.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/7.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..b4aa682d7d0d949310d2a15541f0ab61919d2db8 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/7.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:46cd812f0c552b6d1c5f231b39bee11af8dec8d2c4d679fecb22cc56898792a7 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/8.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/8.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..0c6e3825377e5c5781cde3ce53d70b99f9febe27 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/8.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3cbee0e2dae3a938680b3e042966b6ea9a8939e5f7f24b1de2cb556b096a857a +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/8.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/8.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..0995adfadc6e80bc82f5d973183ecd42bbd9c647 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/8.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9b967a29ef28328950943f51125bf82e6816b682090bb511c4fbde4dada1fcf7 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/8.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/8.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..57a656f9f8cde96c7f2964fdc7ca5ee6db7908da --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/8.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c6c91e7e7786572e926647fec06187a9e4ea5934827bf0382372bf8525e0008c +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/8.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/8.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..6982d07ee2d594014f02f0101fa49eb13af65502 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/8.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:af30b7a2358636e000e1ad45f6f7a3d94c92aa9617d892086641d07d80d345aa +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/9.0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/9.0.0 new file mode 100644 index 0000000000000000000000000000000000000000..2be57d821d576dea4174b6d016ccccebc81386e6 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/9.0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f72493faee96ebe1166da06d6a3f795e99d0ce4466b322610a3d00f1f71f60e1 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/9.1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/9.1.0 new file mode 100644 index 0000000000000000000000000000000000000000..ae191933cd3a279fa8d6acfdd580803052498c4f --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/9.1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:09e42cc9fe6d56984a7625ad1625fad62f48c18bfb4c15421060315286fd0265 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/9.2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/9.2.0 new file mode 100644 index 0000000000000000000000000000000000000000..fe9d335190f78f74604b8a8f598cb7598a7f0f64 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/9.2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1dfd0ad9a753b65d31833e8b801b19b3ea88a8a28606d18e3abe0d9ec391d029 +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/9.3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/9.3.0 new file mode 100644 index 0000000000000000000000000000000000000000..7c0735c9a4ee1079a653931dd74c86c2cb5cf97f --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.decoder.layers.self_attention.linear_qkv.weight/9.3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ec856e0e68d0dd7f8d86a18480d429cd96e4edffab1c6fe54eee469e8bec5b4b +size 18874368 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.embedding.word_embeddings.weight/.zarray b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.embedding.word_embeddings.weight/.zarray new file mode 100644 index 0000000000000000000000000000000000000000..4ea1d5f502a52f7c36ea1be822efd4035aa252ae --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.embedding.word_embeddings.weight/.zarray @@ -0,0 +1,16 @@ +{ + "chunks": [ + 64000, + 3072 + ], + "compressor": null, + "dtype": "bfloat16", + "fill_value": null, + "filters": null, + "order": "C", + "shape": [ + 256000, + 3072 + ], + "zarr_format": 2 +} \ No newline at end of file diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.embedding.word_embeddings.weight/0.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.embedding.word_embeddings.weight/0.0 new file mode 100644 index 0000000000000000000000000000000000000000..ec8d9e1960a13d220aa38b5de0048d0b98dfe0e3 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.embedding.word_embeddings.weight/0.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e6e37ff20422f5db68d2678cfa320d1673e8fa4a297cb3ef2ae9fc2a0da990a6 +size 393216000 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.embedding.word_embeddings.weight/1.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.embedding.word_embeddings.weight/1.0 new file mode 100644 index 0000000000000000000000000000000000000000..c9a3c49c8089634464ca2de0a5756206f14aee47 --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.embedding.word_embeddings.weight/1.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2c37d404e28a7412edf78997eb04ac3460540e61f3e206cab71c20526276475d +size 393216000 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.embedding.word_embeddings.weight/2.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.embedding.word_embeddings.weight/2.0 new file mode 100644 index 0000000000000000000000000000000000000000..e2eba2e2dc4f7d4edd71ca7ac6211bcaf78098ff --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.embedding.word_embeddings.weight/2.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:942e80d936d371c500dbacc3389ae7c529c2072ab07f347e90eb102487724a8b +size 393216000 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.embedding.word_embeddings.weight/3.0 b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.embedding.word_embeddings.weight/3.0 new file mode 100644 index 0000000000000000000000000000000000000000..ec547aa3505cd64cfeb72ac7c77177a718fe199f --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/model.embedding.word_embeddings.weight/3.0 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4c743d8e85365aa2c25472b5d13c0c38d798ad79819a5567302951f9a1079828 +size 393216000 diff --git a/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.final_layernorm.weight/.zarray b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.final_layernorm.weight/.zarray new file mode 100644 index 0000000000000000000000000000000000000000..da2ce6f95967ba2d604c30a33803df2fc3263bbf --- /dev/null +++ b/nemo/checkpoints/megatron_gpt_sft--validation_loss=0.000-step=613-consumed_samples=78464-epoch=1-last/optimizer.state.exp_avg.model.decoder.final_layernorm.weight/.zarray @@ -0,0 +1,14 @@ +{ + "chunks": [ + 3072 + ], + "compressor": null, + "dtype": "` is experimental and not ready for production yet. Use at your own risk. +[NeMo W 2024-03-18 05:27:34 experimental:26] `` is experimental and not ready for production yet. Use at your own risk. +[NeMo W 2024-03-18 07:47:22 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/apex/transformer/pipeline_parallel/utils.py:81: UserWarning: This function is only for unittest + warnings.warn("This function is only for unittest") + +[NeMo W 2024-03-18 07:53:39 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/callbacks/model_checkpoint.py:359: UserWarning: `ModelCheckpoint(monitor='validation_loss')` could not find the monitored key in the returned metrics: ['step', 'consumed_samples', 'epoch', 'train_grad_norm', 'train_lr', 'train_loss', 'train_consumed_samples', 'train_step_time', 'train_epoch', 'val_loss', 'val_validation_step_time']. HINT: Did you call `log('validation_loss', value)` in the `LightningModule`? + warning_cache.warn(m) + diff --git a/nemo/nemo_log_globalrank-0_localrank-0.txt b/nemo/nemo_log_globalrank-0_localrank-0.txt new file mode 100644 index 0000000000000000000000000000000000000000..836df8ba03f8e299b82ec8cee966182146d23847 --- /dev/null +++ b/nemo/nemo_log_globalrank-0_localrank-0.txt @@ -0,0 +1,270 @@ +[NeMo W 2024-03-18 05:24:26 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. + See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information. + ret = run_job( + +[NeMo I 2024-03-18 05:24:26 train_gpt_sft:118] + + ************** Experiment configuration *********** +[NeMo I 2024-03-18 05:24:26 train_gpt_sft:119] + name: gemma-7b-sql-nemo + trainer: + num_nodes: 1 + devices: 8 + accelerator: gpu + precision: bf16 + sft: + max_epochs: 1 + max_steps: -1 + val_check_interval: 1000 + save_interval: ${.val_check_interval} + limit_val_batches: 40 + gradient_clip_val: 1.0 + logger: false + enable_checkpointing: false + use_distributed_sampler: false + max_time: null + max_epochs: ${.sft.max_epochs} + max_steps: ${.sft.max_steps} + exp_manager: + explicit_log_dir: models/gemma-7b-sql-nemo + exp_dir: null + name: ${name} + create_wandb_logger: false + wandb_logger_kwargs: + project: null + name: null + resume_if_exists: true + resume_ignore_no_checkpoint: true + create_checkpoint_callback: true + checkpoint_callback_params: + monitor: validation_loss + save_top_k: 5 + mode: min + save_nemo_on_train_end: true + filename: megatron_gpt_sft--{${.monitor}:.3f}-{step}-{consumed_samples}-{epoch} + model_parallel_size: ${model.tensor_model_parallel_size} + save_best_model: false + model: + seed: 1234 + tensor_model_parallel_size: 4 + pipeline_model_parallel_size: 1 + restore_from_path: /workspace/models/pytorch-7b-pt.nemo + resume_from_checkpoint: null + save_nemo_on_validation_end: true + sync_batch_comm: false + megatron_amp_O2: true + encoder_seq_length: 4096 + sequence_parallel: false + activations_checkpoint_granularity: null + activations_checkpoint_method: null + activations_checkpoint_num_layers: null + activations_checkpoint_layers_per_pipeline: null + answer_only_loss: true + gradient_as_bucket_view: false + seq_len_interpolation_factor: null + use_flash_attention: null + hidden_dropout: 0.0 + attention_dropout: 0.0 + ffn_dropout: 0.0 + peft: + peft_scheme: none + restore_from_path: null + lora_tuning: + target_modules: + - attention_qkv + adapter_dim: 32 + adapter_dropout: 0.0 + column_init_method: xavier + row_init_method: zero + layer_selection: null + weight_tying: false + position_embedding_strategy: null + data: + chat: false + chat_prompt_tokens: + system_turn_start: "\0" + turn_start: "\x11" + label_start: "\x12" + end_of_turn: ' + + ' + end_of_name: ' + + ' + sample: false + num_workers: 0 + dataloader_type: single + train_ds: + file_path: nsql.jsonl + global_batch_size: 128 + micro_batch_size: 1 + shuffle: true + memmap_workers: null + max_seq_length: 8192 + min_seq_length: 1 + drop_last: true + label_key: output + add_eos: true + add_sep: false + add_bos: false + truncation_field: input + index_mapping_dir: null + prompt_template: '{input} {output}' + hf_dataset: false + truncation_method: right + validation_ds: + file_path: nsql.jsonl + global_batch_size: 128 + micro_batch_size: 1 + shuffle: false + memmap_workers: ${model.data.train_ds.memmap_workers} + max_seq_length: ${model.data.train_ds.max_seq_length} + min_seq_length: 1 + drop_last: true + label_key: ${model.data.train_ds.label_key} + add_eos: ${model.data.train_ds.add_eos} + add_sep: ${model.data.train_ds.add_sep} + add_bos: ${model.data.train_ds.add_bos} + truncation_field: ${model.data.train_ds.truncation_field} + index_mapping_dir: null + prompt_template: ${model.data.train_ds.prompt_template} + hf_dataset: false + truncation_method: right + output_original_text: true + optim: + name: distributed_fused_adam + lr: 5.0e-06 + weight_decay: 0.01 + betas: + - 0.9 + - 0.98 + sched: + name: CosineAnnealing + warmup_steps: 10 + constant_steps: 1000 + min_lr: 9.0e-07 + bias_activation_fusion: true + +[NeMo W 2024-03-18 05:24:26 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/lightning_fabric/connector.py:554: UserWarning: bf16 is supported for historical reasons but its usage is discouraged. Please set your precision to bf16-mixed instead! + rank_zero_warn( + +[NeMo W 2024-03-18 05:24:26 exp_manager:708] Exp_manager is logging to models/gemma-7b-sql-nemo, but it already exists. +[NeMo W 2024-03-18 05:24:26 exp_manager:630] There were no checkpoints found in checkpoint_dir or no checkpoint folder at checkpoint_dir :models/gemma-7b-sql-nemo/checkpoints. Training from scratch. +[NeMo I 2024-03-18 05:24:26 exp_manager:396] Experiments will be logged at models/gemma-7b-sql-nemo +[NeMo I 2024-03-18 05:24:27 exp_manager:856] TensorboardLogger has been set up +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo I 2024-03-18 05:24:54 megatron_init:241] Rank 0 has data parallel group : [0, 4] +[NeMo I 2024-03-18 05:24:54 megatron_init:247] Rank 0 has combined group of data parallel and context parallel : [0, 4] +[NeMo I 2024-03-18 05:24:54 megatron_init:252] All data parallel group ranks with context parallel combined: [[0, 4], [1, 5], [2, 6], [3, 7]] +[NeMo I 2024-03-18 05:24:54 megatron_init:255] Ranks 0 has data parallel rank: 0 +[NeMo I 2024-03-18 05:24:54 megatron_init:272] Rank 0 has context parallel group: [0] +[NeMo I 2024-03-18 05:24:54 megatron_init:275] All context parallel group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:24:54 megatron_init:276] Ranks 0 has context parallel rank: 0 +[NeMo I 2024-03-18 05:24:54 megatron_init:287] Rank 0 has model parallel group: [0, 1, 2, 3] +[NeMo I 2024-03-18 05:24:54 megatron_init:288] All model parallel group ranks: [[0, 1, 2, 3], [4, 5, 6, 7]] +[NeMo I 2024-03-18 05:24:54 megatron_init:298] Rank 0 has tensor model parallel group: [0, 1, 2, 3] +[NeMo I 2024-03-18 05:24:54 megatron_init:302] All tensor model parallel group ranks: [[0, 1, 2, 3], [4, 5, 6, 7]] +[NeMo I 2024-03-18 05:24:54 megatron_init:303] Rank 0 has tensor model parallel rank: 0 +[NeMo I 2024-03-18 05:24:54 megatron_init:317] Rank 0 has pipeline model parallel group: [0] +[NeMo I 2024-03-18 05:24:54 megatron_init:329] Rank 0 has embedding group: [0] +[NeMo I 2024-03-18 05:24:54 megatron_init:335] All pipeline model parallel group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:24:54 megatron_init:336] Rank 0 has pipeline model parallel rank 0 +[NeMo I 2024-03-18 05:24:54 megatron_init:337] All embedding group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:24:54 megatron_init:338] Rank 0 has embedding rank: 0 +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo I 2024-03-18 05:24:54 tokenizer_utils:191] Getting SentencePiece with model: /tmp/tmpjkayda0k/c1f49ba929c24b7e95b7219ca958f881_tokenizer-final.model +[NeMo I 2024-03-18 05:24:54 megatron_base_model:520] Padded vocab_size: 256000, original vocab_size: 256000, dummy tokens: 0. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: num_moe_experts in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: bias_gelu_fusion in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: fp8_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:54 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: clone_scatter_output_in_embedding in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:24:55 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/base.py:611: UserWarning: To guarantee overlapping TP and SP collectives with the backwardGEMMs, set environment variable CUDA_DEVICE_MAX_CONNECTIONS = 1 + warnings.warn( + +[NeMo I 2024-03-18 05:27:27 nlp_overrides:1100] Model GPTSFTModel was successfully restored from /workspace/models/pytorch-7b-pt.nemo. +[NeMo I 2024-03-18 05:27:27 train_script_utils:169] Running full finetuning since no peft scheme is given. + | Name | Type | Params + ---------------------------------------- + 0 | model | Float16Module | 2.1 B + ---------------------------------------- + 2.1 B Trainable params + 0 Non-trainable params + 2.1 B Total params + 8,538.206 Total estimated model params size (MB) +[NeMo I 2024-03-18 05:27:27 text_memmap_dataset:116] Building data files +[NeMo I 2024-03-18 05:27:27 text_memmap_dataset:525] Processing 1 data files using 104 workers +[NeMo I 2024-03-18 05:27:29 text_memmap_dataset:535] Time building 0 / 1 mem-mapped files: 0:00:01.612749 +[NeMo I 2024-03-18 05:27:30 text_memmap_dataset:525] Processing 1 data files using 104 workers +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:535] Time building 0 / 1 mem-mapped files: 0:00:01.441462 +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:158] Loading data files +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:249] Loading nsql.jsonl +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000906 +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:165] Computing global indices +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:116] Building data files +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:525] Processing 1 data files using 104 workers +[NeMo I 2024-03-18 05:27:33 text_memmap_dataset:535] Time building 0 / 1 mem-mapped files: 0:00:01.411864 +[NeMo I 2024-03-18 05:27:33 text_memmap_dataset:525] Processing 1 data files using 104 workers +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:535] Time building 0 / 1 mem-mapped files: 0:00:01.369279 +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:158] Loading data files +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:249] Loading nsql.jsonl +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000861 +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:165] Computing global indices +[NeMo I 2024-03-18 05:27:34 builders:327] Building dataloader with consumed samples: 0 +[NeMo W 2024-03-18 05:27:34 experimental:26] `` is experimental and not ready for production yet. Use at your own risk. +[NeMo I 2024-03-18 05:27:34 builders:327] Building dataloader with consumed samples: 0 +[NeMo W 2024-03-18 05:27:34 experimental:26] `` is experimental and not ready for production yet. Use at your own risk. +[NeMo I 2024-03-18 05:27:40 megatron_gpt_model:1296] Pipeline model parallel rank: 0, Tensor model parallel rank: 0, Number of model parameters on device: 2.13e+09. Total number of model parameters: 8.54e+09. +[NeMo I 2024-03-18 05:27:40 modelPT:723] Optimizer config = MegatronDistributedFusedAdam ( + Parameter Group 0 + betas: [0.9, 0.98] + bias_correction: True + eps: 1e-08 + lr: 5e-06 + weight_decay: 0.01 + + Parameter Group 1 + betas: [0.9, 0.98] + bias_correction: True + eps: 1e-08 + lr: 5e-06 + weight_decay: 0.0 + ) +[NeMo I 2024-03-18 05:27:40 lr_scheduler:915] Scheduler "" + will be used during training (effective maximum steps = 613) - + Parameters : + (warmup_steps: 10 + constant_steps: 1000 + min_lr: 9.0e-07 + max_steps: 613 + ) +[NeMo W 2024-03-18 07:47:22 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/apex/transformer/pipeline_parallel/utils.py:81: UserWarning: This function is only for unittest + warnings.warn("This function is only for unittest") + +[NeMo W 2024-03-18 07:53:39 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/pytorch_lightning/callbacks/model_checkpoint.py:359: UserWarning: `ModelCheckpoint(monitor='validation_loss')` could not find the monitored key in the returned metrics: ['step', 'consumed_samples', 'epoch', 'train_grad_norm', 'train_lr', 'train_loss', 'train_consumed_samples', 'train_step_time', 'train_epoch', 'val_loss', 'val_validation_step_time']. HINT: Did you call `log('validation_loss', value)` in the `LightningModule`? + warning_cache.warn(m) + diff --git a/nemo/nemo_log_globalrank-1_localrank-1.txt b/nemo/nemo_log_globalrank-1_localrank-1.txt new file mode 100644 index 0000000000000000000000000000000000000000..624eaa6c39cd4afec55acae0291e3541849f983d --- /dev/null +++ b/nemo/nemo_log_globalrank-1_localrank-1.txt @@ -0,0 +1,252 @@ +[NeMo W 2024-03-18 05:25:14 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. + See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information. + ret = run_job( + +[NeMo I 2024-03-18 05:25:14 train_gpt_sft:118] + + ************** Experiment configuration *********** +[NeMo I 2024-03-18 05:25:14 train_gpt_sft:119] + name: gemma-7b-sql-nemo + trainer: + num_nodes: 1 + devices: 8 + accelerator: gpu + precision: bf16 + sft: + max_epochs: 1 + max_steps: -1 + val_check_interval: 1000 + save_interval: ${.val_check_interval} + limit_val_batches: 40 + gradient_clip_val: 1.0 + logger: false + enable_checkpointing: false + use_distributed_sampler: false + max_time: null + max_epochs: ${.sft.max_epochs} + max_steps: ${.sft.max_steps} + exp_manager: + explicit_log_dir: models/gemma-7b-sql-nemo + exp_dir: null + name: ${name} + create_wandb_logger: false + wandb_logger_kwargs: + project: null + name: null + resume_if_exists: true + resume_ignore_no_checkpoint: true + create_checkpoint_callback: true + checkpoint_callback_params: + monitor: validation_loss + save_top_k: 5 + mode: min + save_nemo_on_train_end: true + filename: megatron_gpt_sft--{${.monitor}:.3f}-{step}-{consumed_samples}-{epoch} + model_parallel_size: ${model.tensor_model_parallel_size} + save_best_model: false + model: + seed: 1234 + tensor_model_parallel_size: 4 + pipeline_model_parallel_size: 1 + restore_from_path: /workspace/models/pytorch-7b-pt.nemo + resume_from_checkpoint: null + save_nemo_on_validation_end: true + sync_batch_comm: false + megatron_amp_O2: true + encoder_seq_length: 4096 + sequence_parallel: false + activations_checkpoint_granularity: null + activations_checkpoint_method: null + activations_checkpoint_num_layers: null + activations_checkpoint_layers_per_pipeline: null + answer_only_loss: true + gradient_as_bucket_view: false + seq_len_interpolation_factor: null + use_flash_attention: null + hidden_dropout: 0.0 + attention_dropout: 0.0 + ffn_dropout: 0.0 + peft: + peft_scheme: none + restore_from_path: null + lora_tuning: + target_modules: + - attention_qkv + adapter_dim: 32 + adapter_dropout: 0.0 + column_init_method: xavier + row_init_method: zero + layer_selection: null + weight_tying: false + position_embedding_strategy: null + data: + chat: false + chat_prompt_tokens: + system_turn_start: "\0" + turn_start: "\x11" + label_start: "\x12" + end_of_turn: ' + + ' + end_of_name: ' + + ' + sample: false + num_workers: 0 + dataloader_type: single + train_ds: + file_path: nsql.jsonl + global_batch_size: 128 + micro_batch_size: 1 + shuffle: true + memmap_workers: null + max_seq_length: 8192 + min_seq_length: 1 + drop_last: true + label_key: output + add_eos: true + add_sep: false + add_bos: false + truncation_field: input + index_mapping_dir: null + prompt_template: '{input} {output}' + hf_dataset: false + truncation_method: right + validation_ds: + file_path: nsql.jsonl + global_batch_size: 128 + micro_batch_size: 1 + shuffle: false + memmap_workers: ${model.data.train_ds.memmap_workers} + max_seq_length: ${model.data.train_ds.max_seq_length} + min_seq_length: 1 + drop_last: true + label_key: ${model.data.train_ds.label_key} + add_eos: ${model.data.train_ds.add_eos} + add_sep: ${model.data.train_ds.add_sep} + add_bos: ${model.data.train_ds.add_bos} + truncation_field: ${model.data.train_ds.truncation_field} + index_mapping_dir: null + prompt_template: ${model.data.train_ds.prompt_template} + hf_dataset: false + truncation_method: right + output_original_text: true + optim: + name: distributed_fused_adam + lr: 5.0e-06 + weight_decay: 0.01 + betas: + - 0.9 + - 0.98 + sched: + name: CosineAnnealing + warmup_steps: 10 + constant_steps: 1000 + min_lr: 9.0e-07 + bias_activation_fusion: true + +[NeMo W 2024-03-18 05:25:14 exp_manager:630] There were no checkpoints found in checkpoint_dir or no checkpoint folder at checkpoint_dir :models/gemma-7b-sql-nemo/checkpoints. Training from scratch. +[NeMo I 2024-03-18 05:25:14 exp_manager:396] Experiments will be logged at models/gemma-7b-sql-nemo +[NeMo I 2024-03-18 05:25:14 exp_manager:856] TensorboardLogger has been set up +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo I 2024-03-18 05:25:56 megatron_init:241] Rank 1 has data parallel group : [1, 5] +[NeMo I 2024-03-18 05:25:56 megatron_init:247] Rank 1 has combined group of data parallel and context parallel : [1, 5] +[NeMo I 2024-03-18 05:25:56 megatron_init:252] All data parallel group ranks with context parallel combined: [[0, 4], [1, 5], [2, 6], [3, 7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:255] Ranks 1 has data parallel rank: 0 +[NeMo I 2024-03-18 05:25:56 megatron_init:272] Rank 1 has context parallel group: [1] +[NeMo I 2024-03-18 05:25:56 megatron_init:275] All context parallel group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:276] Ranks 1 has context parallel rank: 0 +[NeMo I 2024-03-18 05:25:56 megatron_init:287] Rank 1 has model parallel group: [0, 1, 2, 3] +[NeMo I 2024-03-18 05:25:56 megatron_init:288] All model parallel group ranks: [[0, 1, 2, 3], [4, 5, 6, 7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:298] Rank 1 has tensor model parallel group: [0, 1, 2, 3] +[NeMo I 2024-03-18 05:25:56 megatron_init:302] All tensor model parallel group ranks: [[0, 1, 2, 3], [4, 5, 6, 7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:303] Rank 1 has tensor model parallel rank: 1 +[NeMo I 2024-03-18 05:25:56 megatron_init:317] Rank 1 has pipeline model parallel group: [1] +[NeMo I 2024-03-18 05:25:56 megatron_init:329] Rank 1 has embedding group: [1] +[NeMo I 2024-03-18 05:25:56 megatron_init:335] All pipeline model parallel group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:336] Rank 1 has pipeline model parallel rank 0 +[NeMo I 2024-03-18 05:25:56 megatron_init:337] All embedding group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:338] Rank 1 has embedding rank: 0 +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo I 2024-03-18 05:25:56 tokenizer_utils:191] Getting SentencePiece with model: /tmp/tmpqymm0qxt/c1f49ba929c24b7e95b7219ca958f881_tokenizer-final.model +[NeMo I 2024-03-18 05:25:56 megatron_base_model:520] Padded vocab_size: 256000, original vocab_size: 256000, dummy tokens: 0. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: num_moe_experts in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: bias_gelu_fusion in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: fp8_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: clone_scatter_output_in_embedding in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/base.py:611: UserWarning: To guarantee overlapping TP and SP collectives with the backwardGEMMs, set environment variable CUDA_DEVICE_MAX_CONNECTIONS = 1 + warnings.warn( + +[NeMo I 2024-03-18 05:27:30 nlp_overrides:1100] Model GPTSFTModel was successfully restored from /workspace/models/pytorch-7b-pt.nemo. +[NeMo I 2024-03-18 05:27:30 train_script_utils:169] Running full finetuning since no peft scheme is given. + | Name | Type | Params + ---------------------------------------- + 0 | model | Float16Module | 2.1 B + ---------------------------------------- + 2.1 B Trainable params + 0 Non-trainable params + 2.1 B Total params + 8,538.206 Total estimated model params size (MB) +[NeMo I 2024-03-18 05:27:30 text_memmap_dataset:116] Building data files +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:158] Loading data files +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:249] Loading nsql.jsonl +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000896 +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:165] Computing global indices +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:116] Building data files +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:158] Loading data files +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:249] Loading nsql.jsonl +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000631 +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:165] Computing global indices +[NeMo I 2024-03-18 05:27:34 builders:327] Building dataloader with consumed samples: 0 +[NeMo W 2024-03-18 05:27:34 experimental:26] `` is experimental and not ready for production yet. Use at your own risk. +[NeMo I 2024-03-18 05:27:34 builders:327] Building dataloader with consumed samples: 0 +[NeMo W 2024-03-18 05:27:34 experimental:26] `` is experimental and not ready for production yet. Use at your own risk. +[NeMo I 2024-03-18 05:27:40 megatron_gpt_model:1296] Pipeline model parallel rank: 0, Tensor model parallel rank: 1, Number of model parameters on device: 2.13e+09. Total number of model parameters: 8.54e+09. +[NeMo I 2024-03-18 05:27:40 modelPT:723] Optimizer config = MegatronDistributedFusedAdam ( + Parameter Group 0 + betas: [0.9, 0.98] + bias_correction: True + eps: 1e-08 + lr: 5e-06 + weight_decay: 0.01 + + Parameter Group 1 + betas: [0.9, 0.98] + bias_correction: True + eps: 1e-08 + lr: 5e-06 + weight_decay: 0.0 + ) +[NeMo I 2024-03-18 05:27:40 lr_scheduler:915] Scheduler "" + will be used during training (effective maximum steps = 613) - + Parameters : + (warmup_steps: 10 + constant_steps: 1000 + min_lr: 9.0e-07 + max_steps: 613 + ) diff --git a/nemo/nemo_log_globalrank-2_localrank-2.txt b/nemo/nemo_log_globalrank-2_localrank-2.txt new file mode 100644 index 0000000000000000000000000000000000000000..3d2a3082758b1a978ec249f1ea7c05dd27f0677b --- /dev/null +++ b/nemo/nemo_log_globalrank-2_localrank-2.txt @@ -0,0 +1,252 @@ +[NeMo W 2024-03-18 05:25:14 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. + See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information. + ret = run_job( + +[NeMo I 2024-03-18 05:25:14 train_gpt_sft:118] + + ************** Experiment configuration *********** +[NeMo I 2024-03-18 05:25:14 train_gpt_sft:119] + name: gemma-7b-sql-nemo + trainer: + num_nodes: 1 + devices: 8 + accelerator: gpu + precision: bf16 + sft: + max_epochs: 1 + max_steps: -1 + val_check_interval: 1000 + save_interval: ${.val_check_interval} + limit_val_batches: 40 + gradient_clip_val: 1.0 + logger: false + enable_checkpointing: false + use_distributed_sampler: false + max_time: null + max_epochs: ${.sft.max_epochs} + max_steps: ${.sft.max_steps} + exp_manager: + explicit_log_dir: models/gemma-7b-sql-nemo + exp_dir: null + name: ${name} + create_wandb_logger: false + wandb_logger_kwargs: + project: null + name: null + resume_if_exists: true + resume_ignore_no_checkpoint: true + create_checkpoint_callback: true + checkpoint_callback_params: + monitor: validation_loss + save_top_k: 5 + mode: min + save_nemo_on_train_end: true + filename: megatron_gpt_sft--{${.monitor}:.3f}-{step}-{consumed_samples}-{epoch} + model_parallel_size: ${model.tensor_model_parallel_size} + save_best_model: false + model: + seed: 1234 + tensor_model_parallel_size: 4 + pipeline_model_parallel_size: 1 + restore_from_path: /workspace/models/pytorch-7b-pt.nemo + resume_from_checkpoint: null + save_nemo_on_validation_end: true + sync_batch_comm: false + megatron_amp_O2: true + encoder_seq_length: 4096 + sequence_parallel: false + activations_checkpoint_granularity: null + activations_checkpoint_method: null + activations_checkpoint_num_layers: null + activations_checkpoint_layers_per_pipeline: null + answer_only_loss: true + gradient_as_bucket_view: false + seq_len_interpolation_factor: null + use_flash_attention: null + hidden_dropout: 0.0 + attention_dropout: 0.0 + ffn_dropout: 0.0 + peft: + peft_scheme: none + restore_from_path: null + lora_tuning: + target_modules: + - attention_qkv + adapter_dim: 32 + adapter_dropout: 0.0 + column_init_method: xavier + row_init_method: zero + layer_selection: null + weight_tying: false + position_embedding_strategy: null + data: + chat: false + chat_prompt_tokens: + system_turn_start: "\0" + turn_start: "\x11" + label_start: "\x12" + end_of_turn: ' + + ' + end_of_name: ' + + ' + sample: false + num_workers: 0 + dataloader_type: single + train_ds: + file_path: nsql.jsonl + global_batch_size: 128 + micro_batch_size: 1 + shuffle: true + memmap_workers: null + max_seq_length: 8192 + min_seq_length: 1 + drop_last: true + label_key: output + add_eos: true + add_sep: false + add_bos: false + truncation_field: input + index_mapping_dir: null + prompt_template: '{input} {output}' + hf_dataset: false + truncation_method: right + validation_ds: + file_path: nsql.jsonl + global_batch_size: 128 + micro_batch_size: 1 + shuffle: false + memmap_workers: ${model.data.train_ds.memmap_workers} + max_seq_length: ${model.data.train_ds.max_seq_length} + min_seq_length: 1 + drop_last: true + label_key: ${model.data.train_ds.label_key} + add_eos: ${model.data.train_ds.add_eos} + add_sep: ${model.data.train_ds.add_sep} + add_bos: ${model.data.train_ds.add_bos} + truncation_field: ${model.data.train_ds.truncation_field} + index_mapping_dir: null + prompt_template: ${model.data.train_ds.prompt_template} + hf_dataset: false + truncation_method: right + output_original_text: true + optim: + name: distributed_fused_adam + lr: 5.0e-06 + weight_decay: 0.01 + betas: + - 0.9 + - 0.98 + sched: + name: CosineAnnealing + warmup_steps: 10 + constant_steps: 1000 + min_lr: 9.0e-07 + bias_activation_fusion: true + +[NeMo W 2024-03-18 05:25:14 exp_manager:630] There were no checkpoints found in checkpoint_dir or no checkpoint folder at checkpoint_dir :models/gemma-7b-sql-nemo/checkpoints. Training from scratch. +[NeMo I 2024-03-18 05:25:14 exp_manager:396] Experiments will be logged at models/gemma-7b-sql-nemo +[NeMo I 2024-03-18 05:25:14 exp_manager:856] TensorboardLogger has been set up +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo I 2024-03-18 05:25:56 megatron_init:241] Rank 2 has data parallel group : [2, 6] +[NeMo I 2024-03-18 05:25:56 megatron_init:247] Rank 2 has combined group of data parallel and context parallel : [2, 6] +[NeMo I 2024-03-18 05:25:56 megatron_init:252] All data parallel group ranks with context parallel combined: [[0, 4], [1, 5], [2, 6], [3, 7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:255] Ranks 2 has data parallel rank: 0 +[NeMo I 2024-03-18 05:25:56 megatron_init:272] Rank 2 has context parallel group: [2] +[NeMo I 2024-03-18 05:25:56 megatron_init:275] All context parallel group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:276] Ranks 2 has context parallel rank: 0 +[NeMo I 2024-03-18 05:25:56 megatron_init:287] Rank 2 has model parallel group: [0, 1, 2, 3] +[NeMo I 2024-03-18 05:25:56 megatron_init:288] All model parallel group ranks: [[0, 1, 2, 3], [4, 5, 6, 7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:298] Rank 2 has tensor model parallel group: [0, 1, 2, 3] +[NeMo I 2024-03-18 05:25:56 megatron_init:302] All tensor model parallel group ranks: [[0, 1, 2, 3], [4, 5, 6, 7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:303] Rank 2 has tensor model parallel rank: 2 +[NeMo I 2024-03-18 05:25:56 megatron_init:317] Rank 2 has pipeline model parallel group: [2] +[NeMo I 2024-03-18 05:25:56 megatron_init:329] Rank 2 has embedding group: [2] +[NeMo I 2024-03-18 05:25:56 megatron_init:335] All pipeline model parallel group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:336] Rank 2 has pipeline model parallel rank 0 +[NeMo I 2024-03-18 05:25:56 megatron_init:337] All embedding group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:338] Rank 2 has embedding rank: 0 +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo I 2024-03-18 05:25:56 tokenizer_utils:191] Getting SentencePiece with model: /tmp/tmphn9tv6o9/c1f49ba929c24b7e95b7219ca958f881_tokenizer-final.model +[NeMo I 2024-03-18 05:25:56 megatron_base_model:520] Padded vocab_size: 256000, original vocab_size: 256000, dummy tokens: 0. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: num_moe_experts in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: bias_gelu_fusion in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: fp8_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: clone_scatter_output_in_embedding in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/base.py:611: UserWarning: To guarantee overlapping TP and SP collectives with the backwardGEMMs, set environment variable CUDA_DEVICE_MAX_CONNECTIONS = 1 + warnings.warn( + +[NeMo I 2024-03-18 05:27:30 nlp_overrides:1100] Model GPTSFTModel was successfully restored from /workspace/models/pytorch-7b-pt.nemo. +[NeMo I 2024-03-18 05:27:30 train_script_utils:169] Running full finetuning since no peft scheme is given. + | Name | Type | Params + ---------------------------------------- + 0 | model | Float16Module | 2.1 B + ---------------------------------------- + 2.1 B Trainable params + 0 Non-trainable params + 2.1 B Total params + 8,538.206 Total estimated model params size (MB) +[NeMo I 2024-03-18 05:27:30 text_memmap_dataset:116] Building data files +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:158] Loading data files +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:249] Loading nsql.jsonl +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000776 +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:165] Computing global indices +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:116] Building data files +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:158] Loading data files +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:249] Loading nsql.jsonl +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000614 +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:165] Computing global indices +[NeMo I 2024-03-18 05:27:34 builders:327] Building dataloader with consumed samples: 0 +[NeMo W 2024-03-18 05:27:34 experimental:26] `` is experimental and not ready for production yet. Use at your own risk. +[NeMo I 2024-03-18 05:27:34 builders:327] Building dataloader with consumed samples: 0 +[NeMo W 2024-03-18 05:27:34 experimental:26] `` is experimental and not ready for production yet. Use at your own risk. +[NeMo I 2024-03-18 05:27:40 megatron_gpt_model:1296] Pipeline model parallel rank: 0, Tensor model parallel rank: 2, Number of model parameters on device: 2.13e+09. Total number of model parameters: 8.54e+09. +[NeMo I 2024-03-18 05:27:40 modelPT:723] Optimizer config = MegatronDistributedFusedAdam ( + Parameter Group 0 + betas: [0.9, 0.98] + bias_correction: True + eps: 1e-08 + lr: 5e-06 + weight_decay: 0.01 + + Parameter Group 1 + betas: [0.9, 0.98] + bias_correction: True + eps: 1e-08 + lr: 5e-06 + weight_decay: 0.0 + ) +[NeMo I 2024-03-18 05:27:40 lr_scheduler:915] Scheduler "" + will be used during training (effective maximum steps = 613) - + Parameters : + (warmup_steps: 10 + constant_steps: 1000 + min_lr: 9.0e-07 + max_steps: 613 + ) diff --git a/nemo/nemo_log_globalrank-3_localrank-3.txt b/nemo/nemo_log_globalrank-3_localrank-3.txt new file mode 100644 index 0000000000000000000000000000000000000000..c929abf0e382424b8d5b12320a37fdda903d83ca --- /dev/null +++ b/nemo/nemo_log_globalrank-3_localrank-3.txt @@ -0,0 +1,252 @@ +[NeMo W 2024-03-18 05:25:14 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. + See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information. + ret = run_job( + +[NeMo I 2024-03-18 05:25:14 train_gpt_sft:118] + + ************** Experiment configuration *********** +[NeMo I 2024-03-18 05:25:14 train_gpt_sft:119] + name: gemma-7b-sql-nemo + trainer: + num_nodes: 1 + devices: 8 + accelerator: gpu + precision: bf16 + sft: + max_epochs: 1 + max_steps: -1 + val_check_interval: 1000 + save_interval: ${.val_check_interval} + limit_val_batches: 40 + gradient_clip_val: 1.0 + logger: false + enable_checkpointing: false + use_distributed_sampler: false + max_time: null + max_epochs: ${.sft.max_epochs} + max_steps: ${.sft.max_steps} + exp_manager: + explicit_log_dir: models/gemma-7b-sql-nemo + exp_dir: null + name: ${name} + create_wandb_logger: false + wandb_logger_kwargs: + project: null + name: null + resume_if_exists: true + resume_ignore_no_checkpoint: true + create_checkpoint_callback: true + checkpoint_callback_params: + monitor: validation_loss + save_top_k: 5 + mode: min + save_nemo_on_train_end: true + filename: megatron_gpt_sft--{${.monitor}:.3f}-{step}-{consumed_samples}-{epoch} + model_parallel_size: ${model.tensor_model_parallel_size} + save_best_model: false + model: + seed: 1234 + tensor_model_parallel_size: 4 + pipeline_model_parallel_size: 1 + restore_from_path: /workspace/models/pytorch-7b-pt.nemo + resume_from_checkpoint: null + save_nemo_on_validation_end: true + sync_batch_comm: false + megatron_amp_O2: true + encoder_seq_length: 4096 + sequence_parallel: false + activations_checkpoint_granularity: null + activations_checkpoint_method: null + activations_checkpoint_num_layers: null + activations_checkpoint_layers_per_pipeline: null + answer_only_loss: true + gradient_as_bucket_view: false + seq_len_interpolation_factor: null + use_flash_attention: null + hidden_dropout: 0.0 + attention_dropout: 0.0 + ffn_dropout: 0.0 + peft: + peft_scheme: none + restore_from_path: null + lora_tuning: + target_modules: + - attention_qkv + adapter_dim: 32 + adapter_dropout: 0.0 + column_init_method: xavier + row_init_method: zero + layer_selection: null + weight_tying: false + position_embedding_strategy: null + data: + chat: false + chat_prompt_tokens: + system_turn_start: "\0" + turn_start: "\x11" + label_start: "\x12" + end_of_turn: ' + + ' + end_of_name: ' + + ' + sample: false + num_workers: 0 + dataloader_type: single + train_ds: + file_path: nsql.jsonl + global_batch_size: 128 + micro_batch_size: 1 + shuffle: true + memmap_workers: null + max_seq_length: 8192 + min_seq_length: 1 + drop_last: true + label_key: output + add_eos: true + add_sep: false + add_bos: false + truncation_field: input + index_mapping_dir: null + prompt_template: '{input} {output}' + hf_dataset: false + truncation_method: right + validation_ds: + file_path: nsql.jsonl + global_batch_size: 128 + micro_batch_size: 1 + shuffle: false + memmap_workers: ${model.data.train_ds.memmap_workers} + max_seq_length: ${model.data.train_ds.max_seq_length} + min_seq_length: 1 + drop_last: true + label_key: ${model.data.train_ds.label_key} + add_eos: ${model.data.train_ds.add_eos} + add_sep: ${model.data.train_ds.add_sep} + add_bos: ${model.data.train_ds.add_bos} + truncation_field: ${model.data.train_ds.truncation_field} + index_mapping_dir: null + prompt_template: ${model.data.train_ds.prompt_template} + hf_dataset: false + truncation_method: right + output_original_text: true + optim: + name: distributed_fused_adam + lr: 5.0e-06 + weight_decay: 0.01 + betas: + - 0.9 + - 0.98 + sched: + name: CosineAnnealing + warmup_steps: 10 + constant_steps: 1000 + min_lr: 9.0e-07 + bias_activation_fusion: true + +[NeMo W 2024-03-18 05:25:14 exp_manager:630] There were no checkpoints found in checkpoint_dir or no checkpoint folder at checkpoint_dir :models/gemma-7b-sql-nemo/checkpoints. Training from scratch. +[NeMo I 2024-03-18 05:25:14 exp_manager:396] Experiments will be logged at models/gemma-7b-sql-nemo +[NeMo I 2024-03-18 05:25:14 exp_manager:856] TensorboardLogger has been set up +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo I 2024-03-18 05:25:57 megatron_init:241] Rank 3 has data parallel group : [3, 7] +[NeMo I 2024-03-18 05:25:57 megatron_init:247] Rank 3 has combined group of data parallel and context parallel : [3, 7] +[NeMo I 2024-03-18 05:25:57 megatron_init:252] All data parallel group ranks with context parallel combined: [[0, 4], [1, 5], [2, 6], [3, 7]] +[NeMo I 2024-03-18 05:25:57 megatron_init:255] Ranks 3 has data parallel rank: 0 +[NeMo I 2024-03-18 05:25:57 megatron_init:272] Rank 3 has context parallel group: [3] +[NeMo I 2024-03-18 05:25:57 megatron_init:275] All context parallel group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:25:57 megatron_init:276] Ranks 3 has context parallel rank: 0 +[NeMo I 2024-03-18 05:25:57 megatron_init:287] Rank 3 has model parallel group: [0, 1, 2, 3] +[NeMo I 2024-03-18 05:25:57 megatron_init:288] All model parallel group ranks: [[0, 1, 2, 3], [4, 5, 6, 7]] +[NeMo I 2024-03-18 05:25:57 megatron_init:298] Rank 3 has tensor model parallel group: [0, 1, 2, 3] +[NeMo I 2024-03-18 05:25:57 megatron_init:302] All tensor model parallel group ranks: [[0, 1, 2, 3], [4, 5, 6, 7]] +[NeMo I 2024-03-18 05:25:57 megatron_init:303] Rank 3 has tensor model parallel rank: 3 +[NeMo I 2024-03-18 05:25:57 megatron_init:317] Rank 3 has pipeline model parallel group: [3] +[NeMo I 2024-03-18 05:25:57 megatron_init:329] Rank 3 has embedding group: [3] +[NeMo I 2024-03-18 05:25:57 megatron_init:335] All pipeline model parallel group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:25:57 megatron_init:336] Rank 3 has pipeline model parallel rank 0 +[NeMo I 2024-03-18 05:25:57 megatron_init:337] All embedding group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:25:57 megatron_init:338] Rank 3 has embedding rank: 0 +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo I 2024-03-18 05:25:57 tokenizer_utils:191] Getting SentencePiece with model: /tmp/tmpe7phpf8c/c1f49ba929c24b7e95b7219ca958f881_tokenizer-final.model +[NeMo I 2024-03-18 05:25:57 megatron_base_model:520] Padded vocab_size: 256000, original vocab_size: 256000, dummy tokens: 0. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: num_moe_experts in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: bias_gelu_fusion in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: fp8_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: clone_scatter_output_in_embedding in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:57 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/base.py:611: UserWarning: To guarantee overlapping TP and SP collectives with the backwardGEMMs, set environment variable CUDA_DEVICE_MAX_CONNECTIONS = 1 + warnings.warn( + +[NeMo I 2024-03-18 05:27:29 nlp_overrides:1100] Model GPTSFTModel was successfully restored from /workspace/models/pytorch-7b-pt.nemo. +[NeMo I 2024-03-18 05:27:29 train_script_utils:169] Running full finetuning since no peft scheme is given. + | Name | Type | Params + ---------------------------------------- + 0 | model | Float16Module | 2.1 B + ---------------------------------------- + 2.1 B Trainable params + 0 Non-trainable params + 2.1 B Total params + 8,538.206 Total estimated model params size (MB) +[NeMo I 2024-03-18 05:27:29 text_memmap_dataset:116] Building data files +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:158] Loading data files +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:249] Loading nsql.jsonl +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000700 +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:165] Computing global indices +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:116] Building data files +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:158] Loading data files +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:249] Loading nsql.jsonl +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000550 +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:165] Computing global indices +[NeMo I 2024-03-18 05:27:34 builders:327] Building dataloader with consumed samples: 0 +[NeMo W 2024-03-18 05:27:34 experimental:26] `` is experimental and not ready for production yet. Use at your own risk. +[NeMo I 2024-03-18 05:27:34 builders:327] Building dataloader with consumed samples: 0 +[NeMo W 2024-03-18 05:27:34 experimental:26] `` is experimental and not ready for production yet. Use at your own risk. +[NeMo I 2024-03-18 05:27:40 megatron_gpt_model:1296] Pipeline model parallel rank: 0, Tensor model parallel rank: 3, Number of model parameters on device: 2.13e+09. Total number of model parameters: 8.54e+09. +[NeMo I 2024-03-18 05:27:40 modelPT:723] Optimizer config = MegatronDistributedFusedAdam ( + Parameter Group 0 + betas: [0.9, 0.98] + bias_correction: True + eps: 1e-08 + lr: 5e-06 + weight_decay: 0.01 + + Parameter Group 1 + betas: [0.9, 0.98] + bias_correction: True + eps: 1e-08 + lr: 5e-06 + weight_decay: 0.0 + ) +[NeMo I 2024-03-18 05:27:40 lr_scheduler:915] Scheduler "" + will be used during training (effective maximum steps = 613) - + Parameters : + (warmup_steps: 10 + constant_steps: 1000 + min_lr: 9.0e-07 + max_steps: 613 + ) diff --git a/nemo/nemo_log_globalrank-4_localrank-4.txt b/nemo/nemo_log_globalrank-4_localrank-4.txt new file mode 100644 index 0000000000000000000000000000000000000000..e0be1e338c7e82c28c6d8fe3a8b170766b35eb47 --- /dev/null +++ b/nemo/nemo_log_globalrank-4_localrank-4.txt @@ -0,0 +1,252 @@ +[NeMo W 2024-03-18 05:25:12 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. + See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information. + ret = run_job( + +[NeMo I 2024-03-18 05:25:12 train_gpt_sft:118] + + ************** Experiment configuration *********** +[NeMo I 2024-03-18 05:25:12 train_gpt_sft:119] + name: gemma-7b-sql-nemo + trainer: + num_nodes: 1 + devices: 8 + accelerator: gpu + precision: bf16 + sft: + max_epochs: 1 + max_steps: -1 + val_check_interval: 1000 + save_interval: ${.val_check_interval} + limit_val_batches: 40 + gradient_clip_val: 1.0 + logger: false + enable_checkpointing: false + use_distributed_sampler: false + max_time: null + max_epochs: ${.sft.max_epochs} + max_steps: ${.sft.max_steps} + exp_manager: + explicit_log_dir: models/gemma-7b-sql-nemo + exp_dir: null + name: ${name} + create_wandb_logger: false + wandb_logger_kwargs: + project: null + name: null + resume_if_exists: true + resume_ignore_no_checkpoint: true + create_checkpoint_callback: true + checkpoint_callback_params: + monitor: validation_loss + save_top_k: 5 + mode: min + save_nemo_on_train_end: true + filename: megatron_gpt_sft--{${.monitor}:.3f}-{step}-{consumed_samples}-{epoch} + model_parallel_size: ${model.tensor_model_parallel_size} + save_best_model: false + model: + seed: 1234 + tensor_model_parallel_size: 4 + pipeline_model_parallel_size: 1 + restore_from_path: /workspace/models/pytorch-7b-pt.nemo + resume_from_checkpoint: null + save_nemo_on_validation_end: true + sync_batch_comm: false + megatron_amp_O2: true + encoder_seq_length: 4096 + sequence_parallel: false + activations_checkpoint_granularity: null + activations_checkpoint_method: null + activations_checkpoint_num_layers: null + activations_checkpoint_layers_per_pipeline: null + answer_only_loss: true + gradient_as_bucket_view: false + seq_len_interpolation_factor: null + use_flash_attention: null + hidden_dropout: 0.0 + attention_dropout: 0.0 + ffn_dropout: 0.0 + peft: + peft_scheme: none + restore_from_path: null + lora_tuning: + target_modules: + - attention_qkv + adapter_dim: 32 + adapter_dropout: 0.0 + column_init_method: xavier + row_init_method: zero + layer_selection: null + weight_tying: false + position_embedding_strategy: null + data: + chat: false + chat_prompt_tokens: + system_turn_start: "\0" + turn_start: "\x11" + label_start: "\x12" + end_of_turn: ' + + ' + end_of_name: ' + + ' + sample: false + num_workers: 0 + dataloader_type: single + train_ds: + file_path: nsql.jsonl + global_batch_size: 128 + micro_batch_size: 1 + shuffle: true + memmap_workers: null + max_seq_length: 8192 + min_seq_length: 1 + drop_last: true + label_key: output + add_eos: true + add_sep: false + add_bos: false + truncation_field: input + index_mapping_dir: null + prompt_template: '{input} {output}' + hf_dataset: false + truncation_method: right + validation_ds: + file_path: nsql.jsonl + global_batch_size: 128 + micro_batch_size: 1 + shuffle: false + memmap_workers: ${model.data.train_ds.memmap_workers} + max_seq_length: ${model.data.train_ds.max_seq_length} + min_seq_length: 1 + drop_last: true + label_key: ${model.data.train_ds.label_key} + add_eos: ${model.data.train_ds.add_eos} + add_sep: ${model.data.train_ds.add_sep} + add_bos: ${model.data.train_ds.add_bos} + truncation_field: ${model.data.train_ds.truncation_field} + index_mapping_dir: null + prompt_template: ${model.data.train_ds.prompt_template} + hf_dataset: false + truncation_method: right + output_original_text: true + optim: + name: distributed_fused_adam + lr: 5.0e-06 + weight_decay: 0.01 + betas: + - 0.9 + - 0.98 + sched: + name: CosineAnnealing + warmup_steps: 10 + constant_steps: 1000 + min_lr: 9.0e-07 + bias_activation_fusion: true + +[NeMo W 2024-03-18 05:25:13 exp_manager:630] There were no checkpoints found in checkpoint_dir or no checkpoint folder at checkpoint_dir :models/gemma-7b-sql-nemo/checkpoints. Training from scratch. +[NeMo I 2024-03-18 05:25:13 exp_manager:396] Experiments will be logged at models/gemma-7b-sql-nemo +[NeMo I 2024-03-18 05:25:13 exp_manager:856] TensorboardLogger has been set up +[NeMo W 2024-03-18 05:25:53 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:53 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:53 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:53 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:53 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:53 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:53 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:53 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:53 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo I 2024-03-18 05:25:54 megatron_init:241] Rank 4 has data parallel group : [0, 4] +[NeMo I 2024-03-18 05:25:54 megatron_init:247] Rank 4 has combined group of data parallel and context parallel : [0, 4] +[NeMo I 2024-03-18 05:25:54 megatron_init:252] All data parallel group ranks with context parallel combined: [[0, 4], [1, 5], [2, 6], [3, 7]] +[NeMo I 2024-03-18 05:25:54 megatron_init:255] Ranks 4 has data parallel rank: 1 +[NeMo I 2024-03-18 05:25:54 megatron_init:272] Rank 4 has context parallel group: [4] +[NeMo I 2024-03-18 05:25:54 megatron_init:275] All context parallel group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:25:54 megatron_init:276] Ranks 4 has context parallel rank: 0 +[NeMo I 2024-03-18 05:25:54 megatron_init:287] Rank 4 has model parallel group: [4, 5, 6, 7] +[NeMo I 2024-03-18 05:25:54 megatron_init:288] All model parallel group ranks: [[0, 1, 2, 3], [4, 5, 6, 7]] +[NeMo I 2024-03-18 05:25:54 megatron_init:298] Rank 4 has tensor model parallel group: [4, 5, 6, 7] +[NeMo I 2024-03-18 05:25:54 megatron_init:302] All tensor model parallel group ranks: [[0, 1, 2, 3], [4, 5, 6, 7]] +[NeMo I 2024-03-18 05:25:54 megatron_init:303] Rank 4 has tensor model parallel rank: 0 +[NeMo I 2024-03-18 05:25:54 megatron_init:317] Rank 4 has pipeline model parallel group: [4] +[NeMo I 2024-03-18 05:25:54 megatron_init:329] Rank 4 has embedding group: [4] +[NeMo I 2024-03-18 05:25:54 megatron_init:335] All pipeline model parallel group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:25:54 megatron_init:336] Rank 4 has pipeline model parallel rank 0 +[NeMo I 2024-03-18 05:25:54 megatron_init:337] All embedding group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:25:54 megatron_init:338] Rank 4 has embedding rank: 0 +[NeMo W 2024-03-18 05:25:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo I 2024-03-18 05:25:54 tokenizer_utils:191] Getting SentencePiece with model: /tmp/tmpxl0xev51/c1f49ba929c24b7e95b7219ca958f881_tokenizer-final.model +[NeMo I 2024-03-18 05:25:54 megatron_base_model:520] Padded vocab_size: 256000, original vocab_size: 256000, dummy tokens: 0. +[NeMo W 2024-03-18 05:25:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:54 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:54 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: num_moe_experts in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:54 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: bias_gelu_fusion in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:54 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: fp8_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:54 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: clone_scatter_output_in_embedding in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:54 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/base.py:611: UserWarning: To guarantee overlapping TP and SP collectives with the backwardGEMMs, set environment variable CUDA_DEVICE_MAX_CONNECTIONS = 1 + warnings.warn( + +[NeMo I 2024-03-18 05:27:28 nlp_overrides:1100] Model GPTSFTModel was successfully restored from /workspace/models/pytorch-7b-pt.nemo. +[NeMo I 2024-03-18 05:27:28 train_script_utils:169] Running full finetuning since no peft scheme is given. + | Name | Type | Params + ---------------------------------------- + 0 | model | Float16Module | 2.1 B + ---------------------------------------- + 2.1 B Trainable params + 0 Non-trainable params + 2.1 B Total params + 8,538.206 Total estimated model params size (MB) +[NeMo I 2024-03-18 05:27:28 text_memmap_dataset:116] Building data files +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:158] Loading data files +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:249] Loading nsql.jsonl +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000759 +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:165] Computing global indices +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:116] Building data files +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:158] Loading data files +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:249] Loading nsql.jsonl +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000591 +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:165] Computing global indices +[NeMo I 2024-03-18 05:27:34 builders:327] Building dataloader with consumed samples: 0 +[NeMo W 2024-03-18 05:27:34 experimental:26] `` is experimental and not ready for production yet. Use at your own risk. +[NeMo I 2024-03-18 05:27:34 builders:327] Building dataloader with consumed samples: 0 +[NeMo W 2024-03-18 05:27:34 experimental:26] `` is experimental and not ready for production yet. Use at your own risk. +[NeMo I 2024-03-18 05:27:40 megatron_gpt_model:1296] Pipeline model parallel rank: 0, Tensor model parallel rank: 0, Number of model parameters on device: 2.13e+09. Total number of model parameters: 8.54e+09. +[NeMo I 2024-03-18 05:27:40 modelPT:723] Optimizer config = MegatronDistributedFusedAdam ( + Parameter Group 0 + betas: [0.9, 0.98] + bias_correction: True + eps: 1e-08 + lr: 5e-06 + weight_decay: 0.01 + + Parameter Group 1 + betas: [0.9, 0.98] + bias_correction: True + eps: 1e-08 + lr: 5e-06 + weight_decay: 0.0 + ) +[NeMo I 2024-03-18 05:27:40 lr_scheduler:915] Scheduler "" + will be used during training (effective maximum steps = 613) - + Parameters : + (warmup_steps: 10 + constant_steps: 1000 + min_lr: 9.0e-07 + max_steps: 613 + ) diff --git a/nemo/nemo_log_globalrank-5_localrank-5.txt b/nemo/nemo_log_globalrank-5_localrank-5.txt new file mode 100644 index 0000000000000000000000000000000000000000..a3adbd2f62b86552a12a426be61affc0d428ce54 --- /dev/null +++ b/nemo/nemo_log_globalrank-5_localrank-5.txt @@ -0,0 +1,252 @@ +[NeMo W 2024-03-18 05:25:14 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. + See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information. + ret = run_job( + +[NeMo I 2024-03-18 05:25:14 train_gpt_sft:118] + + ************** Experiment configuration *********** +[NeMo I 2024-03-18 05:25:14 train_gpt_sft:119] + name: gemma-7b-sql-nemo + trainer: + num_nodes: 1 + devices: 8 + accelerator: gpu + precision: bf16 + sft: + max_epochs: 1 + max_steps: -1 + val_check_interval: 1000 + save_interval: ${.val_check_interval} + limit_val_batches: 40 + gradient_clip_val: 1.0 + logger: false + enable_checkpointing: false + use_distributed_sampler: false + max_time: null + max_epochs: ${.sft.max_epochs} + max_steps: ${.sft.max_steps} + exp_manager: + explicit_log_dir: models/gemma-7b-sql-nemo + exp_dir: null + name: ${name} + create_wandb_logger: false + wandb_logger_kwargs: + project: null + name: null + resume_if_exists: true + resume_ignore_no_checkpoint: true + create_checkpoint_callback: true + checkpoint_callback_params: + monitor: validation_loss + save_top_k: 5 + mode: min + save_nemo_on_train_end: true + filename: megatron_gpt_sft--{${.monitor}:.3f}-{step}-{consumed_samples}-{epoch} + model_parallel_size: ${model.tensor_model_parallel_size} + save_best_model: false + model: + seed: 1234 + tensor_model_parallel_size: 4 + pipeline_model_parallel_size: 1 + restore_from_path: /workspace/models/pytorch-7b-pt.nemo + resume_from_checkpoint: null + save_nemo_on_validation_end: true + sync_batch_comm: false + megatron_amp_O2: true + encoder_seq_length: 4096 + sequence_parallel: false + activations_checkpoint_granularity: null + activations_checkpoint_method: null + activations_checkpoint_num_layers: null + activations_checkpoint_layers_per_pipeline: null + answer_only_loss: true + gradient_as_bucket_view: false + seq_len_interpolation_factor: null + use_flash_attention: null + hidden_dropout: 0.0 + attention_dropout: 0.0 + ffn_dropout: 0.0 + peft: + peft_scheme: none + restore_from_path: null + lora_tuning: + target_modules: + - attention_qkv + adapter_dim: 32 + adapter_dropout: 0.0 + column_init_method: xavier + row_init_method: zero + layer_selection: null + weight_tying: false + position_embedding_strategy: null + data: + chat: false + chat_prompt_tokens: + system_turn_start: "\0" + turn_start: "\x11" + label_start: "\x12" + end_of_turn: ' + + ' + end_of_name: ' + + ' + sample: false + num_workers: 0 + dataloader_type: single + train_ds: + file_path: nsql.jsonl + global_batch_size: 128 + micro_batch_size: 1 + shuffle: true + memmap_workers: null + max_seq_length: 8192 + min_seq_length: 1 + drop_last: true + label_key: output + add_eos: true + add_sep: false + add_bos: false + truncation_field: input + index_mapping_dir: null + prompt_template: '{input} {output}' + hf_dataset: false + truncation_method: right + validation_ds: + file_path: nsql.jsonl + global_batch_size: 128 + micro_batch_size: 1 + shuffle: false + memmap_workers: ${model.data.train_ds.memmap_workers} + max_seq_length: ${model.data.train_ds.max_seq_length} + min_seq_length: 1 + drop_last: true + label_key: ${model.data.train_ds.label_key} + add_eos: ${model.data.train_ds.add_eos} + add_sep: ${model.data.train_ds.add_sep} + add_bos: ${model.data.train_ds.add_bos} + truncation_field: ${model.data.train_ds.truncation_field} + index_mapping_dir: null + prompt_template: ${model.data.train_ds.prompt_template} + hf_dataset: false + truncation_method: right + output_original_text: true + optim: + name: distributed_fused_adam + lr: 5.0e-06 + weight_decay: 0.01 + betas: + - 0.9 + - 0.98 + sched: + name: CosineAnnealing + warmup_steps: 10 + constant_steps: 1000 + min_lr: 9.0e-07 + bias_activation_fusion: true + +[NeMo W 2024-03-18 05:25:14 exp_manager:630] There were no checkpoints found in checkpoint_dir or no checkpoint folder at checkpoint_dir :models/gemma-7b-sql-nemo/checkpoints. Training from scratch. +[NeMo I 2024-03-18 05:25:14 exp_manager:396] Experiments will be logged at models/gemma-7b-sql-nemo +[NeMo I 2024-03-18 05:25:14 exp_manager:856] TensorboardLogger has been set up +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo I 2024-03-18 05:25:56 megatron_init:241] Rank 5 has data parallel group : [1, 5] +[NeMo I 2024-03-18 05:25:56 megatron_init:247] Rank 5 has combined group of data parallel and context parallel : [1, 5] +[NeMo I 2024-03-18 05:25:56 megatron_init:252] All data parallel group ranks with context parallel combined: [[0, 4], [1, 5], [2, 6], [3, 7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:255] Ranks 5 has data parallel rank: 1 +[NeMo I 2024-03-18 05:25:56 megatron_init:272] Rank 5 has context parallel group: [5] +[NeMo I 2024-03-18 05:25:56 megatron_init:275] All context parallel group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:276] Ranks 5 has context parallel rank: 0 +[NeMo I 2024-03-18 05:25:56 megatron_init:287] Rank 5 has model parallel group: [4, 5, 6, 7] +[NeMo I 2024-03-18 05:25:56 megatron_init:288] All model parallel group ranks: [[0, 1, 2, 3], [4, 5, 6, 7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:298] Rank 5 has tensor model parallel group: [4, 5, 6, 7] +[NeMo I 2024-03-18 05:25:56 megatron_init:302] All tensor model parallel group ranks: [[0, 1, 2, 3], [4, 5, 6, 7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:303] Rank 5 has tensor model parallel rank: 1 +[NeMo I 2024-03-18 05:25:56 megatron_init:317] Rank 5 has pipeline model parallel group: [5] +[NeMo I 2024-03-18 05:25:56 megatron_init:329] Rank 5 has embedding group: [5] +[NeMo I 2024-03-18 05:25:56 megatron_init:335] All pipeline model parallel group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:336] Rank 5 has pipeline model parallel rank 0 +[NeMo I 2024-03-18 05:25:56 megatron_init:337] All embedding group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:338] Rank 5 has embedding rank: 0 +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo I 2024-03-18 05:25:56 tokenizer_utils:191] Getting SentencePiece with model: /tmp/tmpbmpxr8ky/c1f49ba929c24b7e95b7219ca958f881_tokenizer-final.model +[NeMo I 2024-03-18 05:25:56 megatron_base_model:520] Padded vocab_size: 256000, original vocab_size: 256000, dummy tokens: 0. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: num_moe_experts in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: bias_gelu_fusion in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: fp8_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: clone_scatter_output_in_embedding in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/base.py:611: UserWarning: To guarantee overlapping TP and SP collectives with the backwardGEMMs, set environment variable CUDA_DEVICE_MAX_CONNECTIONS = 1 + warnings.warn( + +[NeMo I 2024-03-18 05:27:30 nlp_overrides:1100] Model GPTSFTModel was successfully restored from /workspace/models/pytorch-7b-pt.nemo. +[NeMo I 2024-03-18 05:27:30 train_script_utils:169] Running full finetuning since no peft scheme is given. + | Name | Type | Params + ---------------------------------------- + 0 | model | Float16Module | 2.1 B + ---------------------------------------- + 2.1 B Trainable params + 0 Non-trainable params + 2.1 B Total params + 8,538.206 Total estimated model params size (MB) +[NeMo I 2024-03-18 05:27:30 text_memmap_dataset:116] Building data files +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:158] Loading data files +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:249] Loading nsql.jsonl +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000825 +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:165] Computing global indices +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:116] Building data files +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:158] Loading data files +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:249] Loading nsql.jsonl +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000659 +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:165] Computing global indices +[NeMo I 2024-03-18 05:27:34 builders:327] Building dataloader with consumed samples: 0 +[NeMo W 2024-03-18 05:27:34 experimental:26] `` is experimental and not ready for production yet. Use at your own risk. +[NeMo I 2024-03-18 05:27:34 builders:327] Building dataloader with consumed samples: 0 +[NeMo W 2024-03-18 05:27:34 experimental:26] `` is experimental and not ready for production yet. Use at your own risk. +[NeMo I 2024-03-18 05:27:40 megatron_gpt_model:1296] Pipeline model parallel rank: 0, Tensor model parallel rank: 1, Number of model parameters on device: 2.13e+09. Total number of model parameters: 8.54e+09. +[NeMo I 2024-03-18 05:27:40 modelPT:723] Optimizer config = MegatronDistributedFusedAdam ( + Parameter Group 0 + betas: [0.9, 0.98] + bias_correction: True + eps: 1e-08 + lr: 5e-06 + weight_decay: 0.01 + + Parameter Group 1 + betas: [0.9, 0.98] + bias_correction: True + eps: 1e-08 + lr: 5e-06 + weight_decay: 0.0 + ) +[NeMo I 2024-03-18 05:27:40 lr_scheduler:915] Scheduler "" + will be used during training (effective maximum steps = 613) - + Parameters : + (warmup_steps: 10 + constant_steps: 1000 + min_lr: 9.0e-07 + max_steps: 613 + ) diff --git a/nemo/nemo_log_globalrank-6_localrank-6.txt b/nemo/nemo_log_globalrank-6_localrank-6.txt new file mode 100644 index 0000000000000000000000000000000000000000..16e4c9d99a21f08bf9fb2f65dd45f834e2399b56 --- /dev/null +++ b/nemo/nemo_log_globalrank-6_localrank-6.txt @@ -0,0 +1,252 @@ +[NeMo W 2024-03-18 05:25:14 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. + See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information. + ret = run_job( + +[NeMo I 2024-03-18 05:25:14 train_gpt_sft:118] + + ************** Experiment configuration *********** +[NeMo I 2024-03-18 05:25:14 train_gpt_sft:119] + name: gemma-7b-sql-nemo + trainer: + num_nodes: 1 + devices: 8 + accelerator: gpu + precision: bf16 + sft: + max_epochs: 1 + max_steps: -1 + val_check_interval: 1000 + save_interval: ${.val_check_interval} + limit_val_batches: 40 + gradient_clip_val: 1.0 + logger: false + enable_checkpointing: false + use_distributed_sampler: false + max_time: null + max_epochs: ${.sft.max_epochs} + max_steps: ${.sft.max_steps} + exp_manager: + explicit_log_dir: models/gemma-7b-sql-nemo + exp_dir: null + name: ${name} + create_wandb_logger: false + wandb_logger_kwargs: + project: null + name: null + resume_if_exists: true + resume_ignore_no_checkpoint: true + create_checkpoint_callback: true + checkpoint_callback_params: + monitor: validation_loss + save_top_k: 5 + mode: min + save_nemo_on_train_end: true + filename: megatron_gpt_sft--{${.monitor}:.3f}-{step}-{consumed_samples}-{epoch} + model_parallel_size: ${model.tensor_model_parallel_size} + save_best_model: false + model: + seed: 1234 + tensor_model_parallel_size: 4 + pipeline_model_parallel_size: 1 + restore_from_path: /workspace/models/pytorch-7b-pt.nemo + resume_from_checkpoint: null + save_nemo_on_validation_end: true + sync_batch_comm: false + megatron_amp_O2: true + encoder_seq_length: 4096 + sequence_parallel: false + activations_checkpoint_granularity: null + activations_checkpoint_method: null + activations_checkpoint_num_layers: null + activations_checkpoint_layers_per_pipeline: null + answer_only_loss: true + gradient_as_bucket_view: false + seq_len_interpolation_factor: null + use_flash_attention: null + hidden_dropout: 0.0 + attention_dropout: 0.0 + ffn_dropout: 0.0 + peft: + peft_scheme: none + restore_from_path: null + lora_tuning: + target_modules: + - attention_qkv + adapter_dim: 32 + adapter_dropout: 0.0 + column_init_method: xavier + row_init_method: zero + layer_selection: null + weight_tying: false + position_embedding_strategy: null + data: + chat: false + chat_prompt_tokens: + system_turn_start: "\0" + turn_start: "\x11" + label_start: "\x12" + end_of_turn: ' + + ' + end_of_name: ' + + ' + sample: false + num_workers: 0 + dataloader_type: single + train_ds: + file_path: nsql.jsonl + global_batch_size: 128 + micro_batch_size: 1 + shuffle: true + memmap_workers: null + max_seq_length: 8192 + min_seq_length: 1 + drop_last: true + label_key: output + add_eos: true + add_sep: false + add_bos: false + truncation_field: input + index_mapping_dir: null + prompt_template: '{input} {output}' + hf_dataset: false + truncation_method: right + validation_ds: + file_path: nsql.jsonl + global_batch_size: 128 + micro_batch_size: 1 + shuffle: false + memmap_workers: ${model.data.train_ds.memmap_workers} + max_seq_length: ${model.data.train_ds.max_seq_length} + min_seq_length: 1 + drop_last: true + label_key: ${model.data.train_ds.label_key} + add_eos: ${model.data.train_ds.add_eos} + add_sep: ${model.data.train_ds.add_sep} + add_bos: ${model.data.train_ds.add_bos} + truncation_field: ${model.data.train_ds.truncation_field} + index_mapping_dir: null + prompt_template: ${model.data.train_ds.prompt_template} + hf_dataset: false + truncation_method: right + output_original_text: true + optim: + name: distributed_fused_adam + lr: 5.0e-06 + weight_decay: 0.01 + betas: + - 0.9 + - 0.98 + sched: + name: CosineAnnealing + warmup_steps: 10 + constant_steps: 1000 + min_lr: 9.0e-07 + bias_activation_fusion: true + +[NeMo W 2024-03-18 05:25:14 exp_manager:630] There were no checkpoints found in checkpoint_dir or no checkpoint folder at checkpoint_dir :models/gemma-7b-sql-nemo/checkpoints. Training from scratch. +[NeMo I 2024-03-18 05:25:14 exp_manager:396] Experiments will be logged at models/gemma-7b-sql-nemo +[NeMo I 2024-03-18 05:25:14 exp_manager:856] TensorboardLogger has been set up +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo I 2024-03-18 05:25:56 megatron_init:241] Rank 6 has data parallel group : [2, 6] +[NeMo I 2024-03-18 05:25:56 megatron_init:247] Rank 6 has combined group of data parallel and context parallel : [2, 6] +[NeMo I 2024-03-18 05:25:56 megatron_init:252] All data parallel group ranks with context parallel combined: [[0, 4], [1, 5], [2, 6], [3, 7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:255] Ranks 6 has data parallel rank: 1 +[NeMo I 2024-03-18 05:25:56 megatron_init:272] Rank 6 has context parallel group: [6] +[NeMo I 2024-03-18 05:25:56 megatron_init:275] All context parallel group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:276] Ranks 6 has context parallel rank: 0 +[NeMo I 2024-03-18 05:25:56 megatron_init:287] Rank 6 has model parallel group: [4, 5, 6, 7] +[NeMo I 2024-03-18 05:25:56 megatron_init:288] All model parallel group ranks: [[0, 1, 2, 3], [4, 5, 6, 7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:298] Rank 6 has tensor model parallel group: [4, 5, 6, 7] +[NeMo I 2024-03-18 05:25:56 megatron_init:302] All tensor model parallel group ranks: [[0, 1, 2, 3], [4, 5, 6, 7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:303] Rank 6 has tensor model parallel rank: 2 +[NeMo I 2024-03-18 05:25:56 megatron_init:317] Rank 6 has pipeline model parallel group: [6] +[NeMo I 2024-03-18 05:25:56 megatron_init:329] Rank 6 has embedding group: [6] +[NeMo I 2024-03-18 05:25:56 megatron_init:335] All pipeline model parallel group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:336] Rank 6 has pipeline model parallel rank 0 +[NeMo I 2024-03-18 05:25:56 megatron_init:337] All embedding group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:338] Rank 6 has embedding rank: 0 +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo I 2024-03-18 05:25:56 tokenizer_utils:191] Getting SentencePiece with model: /tmp/tmpnw5cea4l/c1f49ba929c24b7e95b7219ca958f881_tokenizer-final.model +[NeMo I 2024-03-18 05:25:56 megatron_base_model:520] Padded vocab_size: 256000, original vocab_size: 256000, dummy tokens: 0. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: num_moe_experts in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: bias_gelu_fusion in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: fp8_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: clone_scatter_output_in_embedding in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/base.py:611: UserWarning: To guarantee overlapping TP and SP collectives with the backwardGEMMs, set environment variable CUDA_DEVICE_MAX_CONNECTIONS = 1 + warnings.warn( + +[NeMo I 2024-03-18 05:27:29 nlp_overrides:1100] Model GPTSFTModel was successfully restored from /workspace/models/pytorch-7b-pt.nemo. +[NeMo I 2024-03-18 05:27:29 train_script_utils:169] Running full finetuning since no peft scheme is given. + | Name | Type | Params + ---------------------------------------- + 0 | model | Float16Module | 2.1 B + ---------------------------------------- + 2.1 B Trainable params + 0 Non-trainable params + 2.1 B Total params + 8,538.206 Total estimated model params size (MB) +[NeMo I 2024-03-18 05:27:29 text_memmap_dataset:116] Building data files +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:158] Loading data files +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:249] Loading nsql.jsonl +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000681 +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:165] Computing global indices +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:116] Building data files +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:158] Loading data files +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:249] Loading nsql.jsonl +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000545 +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:165] Computing global indices +[NeMo I 2024-03-18 05:27:34 builders:327] Building dataloader with consumed samples: 0 +[NeMo W 2024-03-18 05:27:34 experimental:26] `` is experimental and not ready for production yet. Use at your own risk. +[NeMo I 2024-03-18 05:27:34 builders:327] Building dataloader with consumed samples: 0 +[NeMo W 2024-03-18 05:27:34 experimental:26] `` is experimental and not ready for production yet. Use at your own risk. +[NeMo I 2024-03-18 05:27:40 megatron_gpt_model:1296] Pipeline model parallel rank: 0, Tensor model parallel rank: 2, Number of model parameters on device: 2.13e+09. Total number of model parameters: 8.54e+09. +[NeMo I 2024-03-18 05:27:40 modelPT:723] Optimizer config = MegatronDistributedFusedAdam ( + Parameter Group 0 + betas: [0.9, 0.98] + bias_correction: True + eps: 1e-08 + lr: 5e-06 + weight_decay: 0.01 + + Parameter Group 1 + betas: [0.9, 0.98] + bias_correction: True + eps: 1e-08 + lr: 5e-06 + weight_decay: 0.0 + ) +[NeMo I 2024-03-18 05:27:40 lr_scheduler:915] Scheduler "" + will be used during training (effective maximum steps = 613) - + Parameters : + (warmup_steps: 10 + constant_steps: 1000 + min_lr: 9.0e-07 + max_steps: 613 + ) diff --git a/nemo/nemo_log_globalrank-7_localrank-7.txt b/nemo/nemo_log_globalrank-7_localrank-7.txt new file mode 100644 index 0000000000000000000000000000000000000000..4a4f425cfc7be4db3cda8729f82c0f9ec580f440 --- /dev/null +++ b/nemo/nemo_log_globalrank-7_localrank-7.txt @@ -0,0 +1,252 @@ +[NeMo W 2024-03-18 05:25:14 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. + See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information. + ret = run_job( + +[NeMo I 2024-03-18 05:25:14 train_gpt_sft:118] + + ************** Experiment configuration *********** +[NeMo I 2024-03-18 05:25:14 train_gpt_sft:119] + name: gemma-7b-sql-nemo + trainer: + num_nodes: 1 + devices: 8 + accelerator: gpu + precision: bf16 + sft: + max_epochs: 1 + max_steps: -1 + val_check_interval: 1000 + save_interval: ${.val_check_interval} + limit_val_batches: 40 + gradient_clip_val: 1.0 + logger: false + enable_checkpointing: false + use_distributed_sampler: false + max_time: null + max_epochs: ${.sft.max_epochs} + max_steps: ${.sft.max_steps} + exp_manager: + explicit_log_dir: models/gemma-7b-sql-nemo + exp_dir: null + name: ${name} + create_wandb_logger: false + wandb_logger_kwargs: + project: null + name: null + resume_if_exists: true + resume_ignore_no_checkpoint: true + create_checkpoint_callback: true + checkpoint_callback_params: + monitor: validation_loss + save_top_k: 5 + mode: min + save_nemo_on_train_end: true + filename: megatron_gpt_sft--{${.monitor}:.3f}-{step}-{consumed_samples}-{epoch} + model_parallel_size: ${model.tensor_model_parallel_size} + save_best_model: false + model: + seed: 1234 + tensor_model_parallel_size: 4 + pipeline_model_parallel_size: 1 + restore_from_path: /workspace/models/pytorch-7b-pt.nemo + resume_from_checkpoint: null + save_nemo_on_validation_end: true + sync_batch_comm: false + megatron_amp_O2: true + encoder_seq_length: 4096 + sequence_parallel: false + activations_checkpoint_granularity: null + activations_checkpoint_method: null + activations_checkpoint_num_layers: null + activations_checkpoint_layers_per_pipeline: null + answer_only_loss: true + gradient_as_bucket_view: false + seq_len_interpolation_factor: null + use_flash_attention: null + hidden_dropout: 0.0 + attention_dropout: 0.0 + ffn_dropout: 0.0 + peft: + peft_scheme: none + restore_from_path: null + lora_tuning: + target_modules: + - attention_qkv + adapter_dim: 32 + adapter_dropout: 0.0 + column_init_method: xavier + row_init_method: zero + layer_selection: null + weight_tying: false + position_embedding_strategy: null + data: + chat: false + chat_prompt_tokens: + system_turn_start: "\0" + turn_start: "\x11" + label_start: "\x12" + end_of_turn: ' + + ' + end_of_name: ' + + ' + sample: false + num_workers: 0 + dataloader_type: single + train_ds: + file_path: nsql.jsonl + global_batch_size: 128 + micro_batch_size: 1 + shuffle: true + memmap_workers: null + max_seq_length: 8192 + min_seq_length: 1 + drop_last: true + label_key: output + add_eos: true + add_sep: false + add_bos: false + truncation_field: input + index_mapping_dir: null + prompt_template: '{input} {output}' + hf_dataset: false + truncation_method: right + validation_ds: + file_path: nsql.jsonl + global_batch_size: 128 + micro_batch_size: 1 + shuffle: false + memmap_workers: ${model.data.train_ds.memmap_workers} + max_seq_length: ${model.data.train_ds.max_seq_length} + min_seq_length: 1 + drop_last: true + label_key: ${model.data.train_ds.label_key} + add_eos: ${model.data.train_ds.add_eos} + add_sep: ${model.data.train_ds.add_sep} + add_bos: ${model.data.train_ds.add_bos} + truncation_field: ${model.data.train_ds.truncation_field} + index_mapping_dir: null + prompt_template: ${model.data.train_ds.prompt_template} + hf_dataset: false + truncation_method: right + output_original_text: true + optim: + name: distributed_fused_adam + lr: 5.0e-06 + weight_decay: 0.01 + betas: + - 0.9 + - 0.98 + sched: + name: CosineAnnealing + warmup_steps: 10 + constant_steps: 1000 + min_lr: 9.0e-07 + bias_activation_fusion: true + +[NeMo W 2024-03-18 05:25:14 exp_manager:630] There were no checkpoints found in checkpoint_dir or no checkpoint folder at checkpoint_dir :models/gemma-7b-sql-nemo/checkpoints. Training from scratch. +[NeMo I 2024-03-18 05:25:14 exp_manager:396] Experiments will be logged at models/gemma-7b-sql-nemo +[NeMo I 2024-03-18 05:25:14 exp_manager:856] TensorboardLogger has been set up +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:55 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo I 2024-03-18 05:25:56 megatron_init:241] Rank 7 has data parallel group : [3, 7] +[NeMo I 2024-03-18 05:25:56 megatron_init:247] Rank 7 has combined group of data parallel and context parallel : [3, 7] +[NeMo I 2024-03-18 05:25:56 megatron_init:252] All data parallel group ranks with context parallel combined: [[0, 4], [1, 5], [2, 6], [3, 7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:255] Ranks 7 has data parallel rank: 1 +[NeMo I 2024-03-18 05:25:56 megatron_init:272] Rank 7 has context parallel group: [7] +[NeMo I 2024-03-18 05:25:56 megatron_init:275] All context parallel group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:276] Ranks 7 has context parallel rank: 0 +[NeMo I 2024-03-18 05:25:56 megatron_init:287] Rank 7 has model parallel group: [4, 5, 6, 7] +[NeMo I 2024-03-18 05:25:56 megatron_init:288] All model parallel group ranks: [[0, 1, 2, 3], [4, 5, 6, 7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:298] Rank 7 has tensor model parallel group: [4, 5, 6, 7] +[NeMo I 2024-03-18 05:25:56 megatron_init:302] All tensor model parallel group ranks: [[0, 1, 2, 3], [4, 5, 6, 7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:303] Rank 7 has tensor model parallel rank: 3 +[NeMo I 2024-03-18 05:25:56 megatron_init:317] Rank 7 has pipeline model parallel group: [7] +[NeMo I 2024-03-18 05:25:56 megatron_init:329] Rank 7 has embedding group: [7] +[NeMo I 2024-03-18 05:25:56 megatron_init:335] All pipeline model parallel group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:336] Rank 7 has pipeline model parallel rank 0 +[NeMo I 2024-03-18 05:25:56 megatron_init:337] All embedding group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]] +[NeMo I 2024-03-18 05:25:56 megatron_init:338] Rank 7 has embedding rank: 0 +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo I 2024-03-18 05:25:56 tokenizer_utils:191] Getting SentencePiece with model: /tmp/tmpus1ap94c/c1f49ba929c24b7e95b7219ca958f881_tokenizer-final.model +[NeMo I 2024-03-18 05:25:56 megatron_base_model:520] Padded vocab_size: 256000, original vocab_size: 256000, dummy tokens: 0. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: num_moe_experts in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: bias_gelu_fusion in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: fp8_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: clone_scatter_output_in_embedding in its cfg. Add this key to cfg or config_mapping to make to make it configurable. +[NeMo W 2024-03-18 05:25:56 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/base.py:611: UserWarning: To guarantee overlapping TP and SP collectives with the backwardGEMMs, set environment variable CUDA_DEVICE_MAX_CONNECTIONS = 1 + warnings.warn( + +[NeMo I 2024-03-18 05:27:28 nlp_overrides:1100] Model GPTSFTModel was successfully restored from /workspace/models/pytorch-7b-pt.nemo. +[NeMo I 2024-03-18 05:27:28 train_script_utils:169] Running full finetuning since no peft scheme is given. + | Name | Type | Params + ---------------------------------------- + 0 | model | Float16Module | 2.1 B + ---------------------------------------- + 2.1 B Trainable params + 0 Non-trainable params + 2.1 B Total params + 8,538.206 Total estimated model params size (MB) +[NeMo I 2024-03-18 05:27:28 text_memmap_dataset:116] Building data files +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:158] Loading data files +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:249] Loading nsql.jsonl +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000792 +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:165] Computing global indices +[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:116] Building data files +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:158] Loading data files +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:249] Loading nsql.jsonl +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000646 +[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:165] Computing global indices +[NeMo I 2024-03-18 05:27:34 builders:327] Building dataloader with consumed samples: 0 +[NeMo W 2024-03-18 05:27:34 experimental:26] `` is experimental and not ready for production yet. Use at your own risk. +[NeMo I 2024-03-18 05:27:34 builders:327] Building dataloader with consumed samples: 0 +[NeMo W 2024-03-18 05:27:34 experimental:26] `` is experimental and not ready for production yet. Use at your own risk. +[NeMo I 2024-03-18 05:27:40 megatron_gpt_model:1296] Pipeline model parallel rank: 0, Tensor model parallel rank: 3, Number of model parameters on device: 2.13e+09. Total number of model parameters: 8.54e+09. +[NeMo I 2024-03-18 05:27:40 modelPT:723] Optimizer config = MegatronDistributedFusedAdam ( + Parameter Group 0 + betas: [0.9, 0.98] + bias_correction: True + eps: 1e-08 + lr: 5e-06 + weight_decay: 0.01 + + Parameter Group 1 + betas: [0.9, 0.98] + bias_correction: True + eps: 1e-08 + lr: 5e-06 + weight_decay: 0.0 + ) +[NeMo I 2024-03-18 05:27:40 lr_scheduler:915] Scheduler "" + will be used during training (effective maximum steps = 613) - + Parameters : + (warmup_steps: 10 + constant_steps: 1000 + min_lr: 9.0e-07 + max_steps: 613 + )