---
license: apache-2.0
language:
- en
tags:
- text-generation
- gpt2
- knowledge-distillation
- symbolic-reasoning
- from-scratch
datasets:
- HuggingFaceFW/fineweb-edu
pipeline_tag: text-generation
---

# 124M GPT with Symbolic Reasoning Distillation

A **124M-parameter** GPT-2 trained **from scratch** on
[FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu)
with **knowledge distillation** from
[SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct).

| Component | Value |
|-----------|-------|
| Parameters | ~124M |
| Layers | 12 |
| Heads | 12 |
| Embedding dim | 768 |
| Context | 512 |
| Loss | 0.5 CE + 0.5 KL |
| Hardware | 1x A100 |
| Time | ~75 min |
| Tokens | 327,680,000 |
| Best loss | 326.0111 |