NanoChat d20 Model
This is a d20 (20-layer, 561M parameter) ChatGPT-like model trained using the nanochat pipeline.
Model Details
- Parameters: 560,988,160
- Layers: 20
- Hidden dimension: 1280
- Attention heads: 10
- Vocabulary size: 65,536
- Context length: 2048
Training
- BASE stage: FineWeb-Edu pretraining
- MID stage: SmolTalk + MMLU + GSM8K
- SFT stage: ARC + GSM8K + SmolTalk
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support