End of training
23f827c
verified
-
attn_layer_mapper=last, attn_loss_fn=mse, attn_weight=1.0, lr_scheduler_type=cosine, warmup_ratio=0.5
Training in progress, step 61875
-
attn_layer_mapper=layer-2, attn_loss_fn=mse, attn_weight=1.0, lr_scheduler_type=cosine, warmup_ratio=0.5
Training in progress, step 61875
-
dataset_sample_size=1000000, lr_scheduler_type=cosine, warmup_ratio=0.5
End of training
-
dataset_sample_size=1000000
Training in progress, step 247500
-
dataset_subset=default, dataset_uri=distily_c4_multilingual_1M, lr_scheduler_type=cosine, warmup_ratio=0.5
Training in progress, step 61875
-
hs_layer_mapper=last, hs_loss_fn=mse, hs_weight=1.0, lr_scheduler_type=cosine, warmup_ratio=0.5
Training in progress, step 61875
-
hs_layer_mapper=layer-2, hs_loss_fn=mse, hs_weight=1.0, lr_scheduler_type=cosine, warmup_ratio=0.5
Training in progress, step 61875
-
lr_scheduler_type=cosine, warmup_ratio=0.5
Training in progress, step 61875
-
lr_scheduler_type=inverse_sqrt, warmup_ratio=0.5
End of training
-
lr_scheduler_type=linear, warmup_ratio=0.5
Training in progress, step 61875
-
0 Bytes
Training in progress, step 61875
-
5.26 kB
End of training
-
3.78 MB
End of training
-
29.7 MB
End of training
-
588 Bytes
Training in progress, step 61875