metadata
license: apache-2.0
language:
- en
tags:
- text-generation
- gpt2
- knowledge-distillation
- symbolic-reasoning
- chain-of-thought
- from-scratch
datasets:
- HuggingFaceFW/fineweb-edu
- openai/gsm8k
pipeline_tag: text-generation
124M GPT with Symbolic Reasoning Distillation
Trained from scratch on mixed data with dual-alpha distillation:
| Stream | Dataset | Alpha | Purpose |
|---|---|---|---|
| General | FineWeb-Edu | 0.2 | Language modeling, light teacher guidance |
| Reasoning | GSM8K chain-of-thought | 0.8 | Heavy distillation: teacher guides step-by-step math reasoning |
- Teacher: SmolLM-135M-Instruct (frozen)
- Time: ~75 min on 1x A100
- Tokens: 327,680,000 (0 reasoning / 20,000 general batches)
- Best loss: 186.6474