gpt2-124m-ablation / README.md
bitlabsdb's picture
Create README.md
954efda verified
|
raw
history blame
1.01 kB
metadata
language:
  - en
license: mit
library_name: pytorch
tags:
  - gpt2
  - text-generation
  - pytorch
  - causal-lm
  - research
  - ablation
datasets:
  - HuggingFaceFW/fineweb
pipeline_tag: text-generation

GPT-2 (124M) - Research Ablation Baseline

Model Summary

This is a 124M parameter Causal Language Model (GPT-2 Small architecture) trained entirely from scratch using PyTorch.

It was created as a baseline for a research ablation study to investigate training dynamics, achieving a validation loss of 4.485.

Model Details

  • Architecture: Custom GPT-2 Small (Decoder-only Transformer)
  • Parameters: 124M
  • Context Window: 256 tokens
  • Dimensions: Embedding: 768, Heads: 12, Layers: 12
  • Training Steps: ~28,500
  • Validation Loss: 4.485

How to Use

⚠️ Important: Because this model was trained using a custom PyTorch class (not the standard Hugging Face GPT2LMHeadModel), you must define the model architecture in your code before loading the weights.