GPT-2 Scratch Small FTN Baseline

Scratch-trained small GPT-2 baseline used for FTN comparisons on TinyStories.

This checkpoint is the small scratch GPT-2 baseline used for FTN comparison experiments.

Training summary

  • Layers: 4
  • Hidden size: 256
  • Heads: 4
  • Context length: 256
  • Best validation loss: 3.590700
  • Best epoch: 10

Files

  • standard Hugging Face GPT-2 weights and config
  • metrics_summary.json
  • metrics_history.csv
  • samples.json / samples.txt

Load the model

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained('.')
tokenizer = AutoTokenizer.from_pretrained('.')
Downloads last month
4
Safetensors
Model size
16.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support