Blueberry-Nano (151M)

Blueberry-Nano is a GPT-1 Level 151M parameter language model trained from scratch as part of the 5-Dollar-LLM project. This model is for learning of LLM training process. It can't compete with today's LLMs (yet).

Model Details

Developed by: Open Superintelligence Lab
Model Type: Transformer Decoder-only (Dense)
Architecture: 32 layers, GQA (4 KV heads), 512 embedding dim
Language(s): English
License: MIT
Training Tokens: 1 Billion (1B)

Training Environment

Hardware: Single NVIDIA RTX 4090 (24GB)
Training Time: 156.8 minutes (~2.6 hours)
Optimizer: Muon & AdamW
Precision: Automatic Mixed Precision (AMP)

Results

Final metrics after 1B tokens:

Validation Loss: 3.1940
Validation Accuracy: 40.19%
Validation Perplexity: 24.38

Usage

This model is a base model trained on a mix of educational data. It demonstrates reasonable storytelling and factual knowledge for its size, but may hallucinate and is not yet fine-tuned for instruction following.

Historical Context

This model (151M parameters) reached similar complexity to OpenAI's original GPT-1 (117M) in under 3 hours on a single consumer GPU, showcasing the massive improvement in training efficiency in recent years.

Created by the Open Superintelligence Lab.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train vukrosic/Blueberry-Nano-151M

Evaluation results

Validation Accuracy on FineWeb-Edu, Cosmopedia v2, Python-Edu mix
self-reported

0.402
Validation Loss on FineWeb-Edu, Cosmopedia v2, Python-Edu mix
self-reported

3.194
Validation Perplexity on FineWeb-Edu, Cosmopedia v2, Python-Edu mix
self-reported

24.380