metadata
license: apache-2.0
language:
- en
tags:
- text-generation
- gpt2
- knowledge-distillation
- symbolic-reasoning
- from-scratch
datasets:
- HuggingFaceFW/fineweb-edu
pipeline_tag: text-generation
124M GPT with Symbolic Reasoning Distillation
A 124M-parameter GPT-2 trained from scratch on FineWeb-Edu with knowledge distillation from SmolLM-135M-Instruct.
| Component | Value |
|---|---|
| Parameters | ~124M |
| Layers | 12 |
| Heads | 12 |
| Embedding dim | 768 |
| Context | 512 |
| Loss | 0.5 CE + 0.5 KL |
| Hardware | 1x A100 |
| Time | ~75 min |
| Tokens | 327,680,000 |
| Best loss | 326.0111 |