thecr7guy
/

gpt2-pretrain

Text Generation

Model card Files Files and versions

gpt2-pretrain / README.md

thecr7guy's picture

Update README.md

ee4a8fc verified 5 months ago

|

history blame contribute delete

2.28 kB

	---
	license: mit
	datasets:
	- HuggingFaceFW/fineweb-edu
	- common-pile/arxiv_papers_filtered
	- tiiuae/falcon-refinedweb
	- manu/project_gutenberg
	- nampdn-ai/tiny-textbooks
	- SciPhi/textbooks-are-all-you-need-lite
	- abehandlerorg/ccnews
	base_model:
	- openai-community/gpt2
	pipeline_tag: text-generation
	---

	# GPT-2 from Scratch

	This model implements the GPT-2 architecture (125M parameters) trained from scratch.

	## Model Description

	- Model type: GPT-2 (125M parameters)
	- Architecture: Transformer-based autoregressive language model following the original GPT-2 design
	- Training data: Uses multiple datasets (check tags) - 18Billion tokens.
	- Language: English


	## Performance and Evaluation

	\| Dataset \| Metric \| thecr7guy/gpt2-pretrain \| GPT-2 (baseline) \|
	\|----------------\|-----------\|------------\|------------------\|
	\| HellaSwag \| acc \| 0.291 \| 0.289 \|
	\| SciQ \| acc \| 0.754 \| 0.752 \|
	\| Winogrande \| acc \| 0.491 \| 0.516 \|
	\| TruthfulQA MC1 \| acc \| 0.236 \| 0.228 \|
	\| MMLU (overall) \| acc \| 0.230 \| 0.229 \|
	\| - Humanities \| acc \| 0.242 \| 0.242 \|
	\| - Social Sci. \| acc \| 0.217 \| 0.217 \|
	\| - STEM \| acc \| 0.213 \| 0.213 \|
	\| - Other \| acc \| 0.239 \| 0.238 \|

	## Training Details

	- Training corpus: Approximately 18B tokens (120GB)
	- Training duration: 1 epochs (approximately 8 hours total)
	- Hardware: 8× NVIDIA A100 PCE GPUs via runpod.io
	- Estimated cost: $ (8*13.52) for complete training
	- Token context: 1024 tokens

	### Hyperparameters

	- context_len: 1024
	- seed: 42
	- epochs: 2
	- batch_size: 64
	- total_batch_size: 524288 tokens
	- grad_clip: 1.0
	- optimizer: "adamw"
	- max_lr: 6.0e-4
	- min_lr: 6.0e-5
	- beta1: 0.9
	- beta2: 0.95
	- weight_decay: 0.1


	.

	## Commands used during installation

	- pip install wandb
	- pip install tiktoken
	- pip install --upgrade huggingface_hub
	- pip install torchinfo
	- pip install datasets
	- sudo apt update && sudo apt install tmux
	- tmux new -s training
	- wandb login
	- CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 NCCL_P2P_DISABLE=1 \
	torchrun --standalone --nproc_per_node=8 train.py

	## Contact

	GitHub: [thecr7guy2](https://github.com/thecr7guy2)