124M GPT with Symbolic Reasoning Distillation

Trained from scratch on mixed data with dual-alpha distillation:

Stream Dataset Alpha Purpose
General FineWeb-Edu 0.2 Language modeling, light teacher guidance
Reasoning GSM8K chain-of-thought 0.8 Heavy distillation: teacher guides step-by-step math reasoning
  • Teacher: SmolLM-135M-Instruct (frozen)
  • Time: ~75 min on 1x A100
  • Tokens: 327,680,000 (0 reasoning / 20,000 general batches)
  • Best loss: 186.6474
Downloads last month
20
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train farpluto/zubenelgenubi-1.1-124m

Collection including farpluto/zubenelgenubi-1.1-124m