| # Results: final_c6_18l448_factorized_aggressive | |
| Automatically generated after pretraining. | |
| ## Summary | |
| - Model: `18L / 7H / 448d` | |
| - Total parameters: `39600320` | |
| - Last logged train step: `92680` | |
| - Best validation loss: `3.4662` | |
| - Best validation perplexity: `32.01` | |
| - Last validation step: `92500` | |
| - Learning rate: `0.00056` | |
| - Effective tokens/update: `65536` | |
| ## Files | |
| - [Config snapshot](config_snapshot.json) | |
| - [Train metrics](train_metrics.jsonl) | |
| - [Eval metrics](eval_metrics.jsonl) | |
| - [Events](events.jsonl) | |
| - [Metrics plot](metrics.png) | |
| ## Metrics Plot | |
|  | |