| language: en | |
| tags: | |
| - pytorch | |
| - gpt2 | |
| - language-model | |
| pipeline_tag: text-generation | |
| # GPT-X Model | |
| This model was trained using the GPT-X framework. | |
| ## Model Architecture | |
| - Layers: 12 | |
| - Attention Heads: 12 | |
| - Hidden Size: 768 | |
| - Vocabulary Size: 50257 | |
| - Maximum Sequence Length: 1024 | |
| - Model Type: base | |
| ## Training Details | |
| - Batch Size: 524288 | |
| - Learning Rate: 0.0006 | |
| - Weight Decay: 0.0 | |
| - Mixed Precision: True | |
| - Optimizer: muon | |