Upload 124M GPT trained from scratch with SmolLM distillation

ca40472 verified 19 days ago

761 Bytes

	---
	license: apache-2.0
	language:
	- en
	tags:
	- text-generation
	- gpt2
	- knowledge-distillation
	- symbolic-reasoning
	- from-scratch
	datasets:
	- HuggingFaceFW/fineweb-edu
	pipeline_tag: text-generation
	---

	# 124M GPT with Symbolic Reasoning Distillation

	A 124M-parameter GPT-2 trained from scratch on
	[FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu)
	with knowledge distillation from
	[SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct).

	\| Component \| Value \|
	\|-----------\|-------\|
	\| Parameters \| ~124M \|
	\| Layers \| 12 \|
	\| Heads \| 12 \|
	\| Embedding dim \| 768 \|
	\| Context \| 512 \|
	\| Loss \| 0.5 CE + 0.5 KL \|
	\| Hardware \| 1x A100 \|
	\| Time \| ~75 min \|
	\| Tokens \| 327,680,000 \|
	\| Best loss \| 326.0111 \|