Upload 124M GPT trained from scratch with SmolLM distillation

ca40472 verified 18 days ago

761 Bytes

license: apache-2.0
language:
  - en
tags:
  - text-generation
  - gpt2
  - knowledge-distillation
  - symbolic-reasoning
  - from-scratch
datasets:
  - HuggingFaceFW/fineweb-edu
pipeline_tag: text-generation

124M GPT with Symbolic Reasoning Distillation

A 124M-parameter GPT-2 trained from scratch on FineWeb-Edu with knowledge distillation from SmolLM-135M-Instruct.

Component	Value
Parameters	~124M
Layers	12
Heads	12
Embedding dim	768
Context	512
Loss	0.5 CE + 0.5 KL
Hardware	1x A100
Time	~75 min
Tokens	327,680,000
Best loss	326.0111