GPT-2 Scratch Small FTN Baseline
Scratch-trained small GPT-2 baseline used for FTN comparisons on TinyStories.
This checkpoint is the small scratch GPT-2 baseline used for FTN comparison experiments.
Training summary
- Layers:
4 - Hidden size:
256 - Heads:
4 - Context length:
256 - Best validation loss:
3.590700 - Best epoch:
10
Files
- standard Hugging Face GPT-2 weights and config
metrics_summary.jsonmetrics_history.csvsamples.json/samples.txt
Load the model
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('.')
tokenizer = AutoTokenizer.from_pretrained('.')
- Downloads last month
- 4