--- license: apache-2.0 language: - en tags: - text-generation - gpt2 - knowledge-distillation - symbolic-reasoning - from-scratch datasets: - HuggingFaceFW/fineweb-edu pipeline_tag: text-generation --- # 124M GPT with Symbolic Reasoning Distillation A **124M-parameter** GPT-2 trained **from scratch** on [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) with **knowledge distillation** from [SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct). | Component | Value | |-----------|-------| | Parameters | ~124M | | Layers | 12 | | Heads | 12 | | Embedding dim | 768 | | Context | 512 | | Loss | 0.5 CE + 0.5 KL | | Hardware | 1x A100 | | Time | ~75 min | | Tokens | 327,680,000 | | Best loss | 326.0111 |