zubenelgenubi-124m / README.md
farpluto's picture
Upload 124M GPT trained from scratch with SmolLM distillation
ca40472 verified
metadata
license: apache-2.0
language:
  - en
tags:
  - text-generation
  - gpt2
  - knowledge-distillation
  - symbolic-reasoning
  - from-scratch
datasets:
  - HuggingFaceFW/fineweb-edu
pipeline_tag: text-generation

124M GPT with Symbolic Reasoning Distillation

A 124M-parameter GPT-2 trained from scratch on FineWeb-Edu with knowledge distillation from SmolLM-135M-Instruct.

Component Value
Parameters ~124M
Layers 12
Heads 12
Embedding dim 768
Context 512
Loss 0.5 CE + 0.5 KL
Hardware 1x A100
Time ~75 min
Tokens 327,680,000
Best loss 326.0111