fullrun / results.md
huiting tang
Add files using upload-large-folder tool
fad46a0 verified

Results: final_c6_18l448_factorized_aggressive

Automatically generated after pretraining.

Summary

  • Model: 18L / 7H / 448d
  • Total parameters: 39600320
  • Last logged train step: 92680
  • Best validation loss: 3.4662
  • Best validation perplexity: 32.01
  • Last validation step: 92500
  • Learning rate: 0.00056
  • Effective tokens/update: 65536

Files

Metrics Plot

Metrics plot