mtmx
/

gpt2_124M_1BTokens

Model card Files Files and versions

gpt2_124M_1BTokens / README.md

mtmx's picture

update Readme.md

6ed2a4d verified 3 months ago

|

history blame contribute delete

938 Bytes

	---
	language: en
	license: mit
	tags:
	- gpt2
	- language-model
	- pytorch
	---

	# GPT-2 Model - Trained from Scratch

	This is a GPT-2 model trained from scratch on [your dataset].

	## Model Details

	- Architecture: GPT-2
	- Parameters: 124M
	- Training Steps: 19073
	- Validation Loss: 4.21
	- HellaSwag Accuracy: 25.58%

	## Training Configuration

	- Batch Size: 32
	- Sequence Length: 1024
	- Learning Rate: 6e-4
	- Total Tokens: ~1B tokens

	## Usage
	```python
	import torch

	# Load checkpoint
	checkpoint = torch.load('model_hf_09000.pt') #current saved checkpoint, to be updated after fully training
	config = checkpoint['config']

	# Initialize model
	from your_model_file import GPT # Your GPT class
	model = GPT(config)
	model.load_state_dict(checkpoint['model'])
	model.eval()

	# Generate text
	# ... your generation code ...
	```

	## Training Details

	Trained using custom GPT-2 implementation following Andrej Karpathy's approach.