--- license: mit datasets: - HuggingFaceFW/fineweb-edu - common-pile/arxiv_papers_filtered - tiiuae/falcon-refinedweb - manu/project_gutenberg - nampdn-ai/tiny-textbooks - SciPhi/textbooks-are-all-you-need-lite - abehandlerorg/ccnews base_model: - openai-community/gpt2 pipeline_tag: text-generation --- # GPT-2 from Scratch This model implements the GPT-2 architecture (125M parameters) trained from scratch. ## Model Description - **Model type:** GPT-2 (125M parameters) - **Architecture:** Transformer-based autoregressive language model following the original GPT-2 design - **Training data:** Uses multiple datasets (check tags) - 18Billion tokens. - **Language:** English ## Performance and Evaluation | Dataset | Metric | thecr7guy/gpt2-pretrain | GPT-2 (baseline) | |----------------|-----------|------------|------------------| | HellaSwag | acc | **0.291** | 0.289 | | SciQ | acc | **0.754** | 0.752 | | Winogrande | acc | 0.491 | **0.516** | | TruthfulQA MC1 | acc | **0.236** | 0.228 | | MMLU (overall) | acc | **0.230** | 0.229 | | - Humanities | acc | 0.242 | 0.242 | | - Social Sci. | acc | 0.217 | 0.217 | | - STEM | acc | 0.213 | 0.213 | | - Other | acc | **0.239** | 0.238 | ## Training Details - **Training corpus:** Approximately 18B tokens (120GB) - **Training duration:** 1 epochs (approximately 8 hours total) - **Hardware:** 8× NVIDIA A100 PCE GPUs via runpod.io - **Estimated cost:** $ (8*13.52) for complete training - **Token context:** 1024 tokens ### Hyperparameters - context_len: 1024 - seed: 42 - epochs: 2 - batch_size: 64 - total_batch_size: 524288 tokens - grad_clip: 1.0 - optimizer: "adamw" - max_lr: 6.0e-4 - min_lr: 6.0e-5 - beta1: 0.9 - beta2: 0.95 - weight_decay: 0.1 . ## Commands used during installation - pip install wandb - pip install tiktoken - pip install --upgrade huggingface_hub - pip install torchinfo - pip install datasets - sudo apt update && sudo apt install tmux - tmux new -s training - wandb login - CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 NCCL_P2P_DISABLE=1 \ torchrun --standalone --nproc_per_node=8 train.py ## Contact GitHub: [thecr7guy2](https://github.com/thecr7guy2)