NanoChat d20 Model

This is a d20 (20-layer, 561M parameter) ChatGPT-like model trained using the nanochat pipeline.

Model Details

  • Parameters: 560,988,160
  • Layers: 20
  • Hidden dimension: 1280
  • Attention heads: 10
  • Vocabulary size: 65,536
  • Context length: 2048

Training

  • BASE stage: FineWeb-Edu pretraining
  • MID stage: SmolTalk + MMLU + GSM8K
  • SFT stage: ARC + GSM8K + SmolTalk
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support