Blueberry-Nano (151M)

Blueberry-Nano is a GPT-1 Level 151M parameter language model trained from scratch as part of the 5-Dollar-LLM project. This model is for learning of LLM training process. It can't compete with today's LLMs (yet).

Model Details

  • Developed by: Open Superintelligence Lab
  • Model Type: Transformer Decoder-only (Dense)
  • Architecture: 32 layers, GQA (4 KV heads), 512 embedding dim
  • Language(s): English
  • License: MIT
  • Training Tokens: 1 Billion (1B)

Training Environment

  • Hardware: Single NVIDIA RTX 4090 (24GB)
  • Training Time: 156.8 minutes (~2.6 hours)
  • Optimizer: Muon & AdamW
  • Precision: Automatic Mixed Precision (AMP)

Results

Final metrics after 1B tokens:

  • Validation Loss: 3.1940
  • Validation Accuracy: 40.19%
  • Validation Perplexity: 24.38

Training Plot

Usage

This model is a base model trained on a mix of educational data. It demonstrates reasonable storytelling and factual knowledge for its size, but may hallucinate and is not yet fine-tuned for instruction following.

Historical Context

This model (151M parameters) reached similar complexity to OpenAI's original GPT-1 (117M) in under 3 hours on a single consumer GPU, showcasing the massive improvement in training efficiency in recent years.


Created by the Open Superintelligence Lab.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train vukrosic/Blueberry-Nano-151M

Evaluation results

  • Validation Accuracy on FineWeb-Edu, Cosmopedia v2, Python-Edu mix
    self-reported
    0.402
  • Validation Loss on FineWeb-Edu, Cosmopedia v2, Python-Edu mix
    self-reported
    3.194
  • Validation Perplexity on FineWeb-Edu, Cosmopedia v2, Python-Edu mix
    self-reported
    24.380