--- language: en license: mit tags: - gpt2 - language-model - pytorch --- # GPT-2 Model - Trained from Scratch This is a GPT-2 model trained from scratch on [your dataset]. ## Model Details - **Architecture**: GPT-2 - **Parameters**: 124M - **Training Steps**: 19073 - **Validation Loss**: 4.21 - **HellaSwag Accuracy**: 25.58% ## Training Configuration - Batch Size: 32 - Sequence Length: 1024 - Learning Rate: 6e-4 - Total Tokens: ~1B tokens ## Usage ```python import torch # Load checkpoint checkpoint = torch.load('model_hf_09000.pt') #current saved checkpoint, to be updated after fully training config = checkpoint['config'] # Initialize model from your_model_file import GPT # Your GPT class model = GPT(config) model.load_state_dict(checkpoint['model']) model.eval() # Generate text # ... your generation code ... ``` ## Training Details Trained using custom GPT-2 implementation following Andrej Karpathy's approach.