HobbyLM-30M
A 31.9M parameter dense transformer trained from scratch on 1B tokens of FineWeb.
Built on top of HobbyLM by rootxhacker.
Training
- Parameters: 31.9M (fully dense)
- Dataset: FineWeb (1B tokens)
- Steps: 3800
- Final val loss: 3.9077
- Architecture: 8 layers, d_model=384, 6 heads, GQA, RoPE, RMSNorm, Muon optimizer
This is a base model
No instruction tuning. Generates fluent English but drifts off topic — expected at this scale.
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support