Results: final_c6_18l448_factorized_aggressive
Automatically generated after pretraining.
Summary
- Model:
18L / 7H / 448d - Total parameters:
39600320 - Last logged train step:
92680 - Best validation loss:
3.4662 - Best validation perplexity:
32.01 - Last validation step:
92500 - Learning rate:
0.00056 - Effective tokens/update:
65536
