Blueberry-Nano (151M)
Blueberry-Nano is a GPT-1 Level 151M parameter language model trained from scratch as part of the 5-Dollar-LLM project. This model is for learning of LLM training process. It can't compete with today's LLMs (yet).
Model Details
- Developed by: Open Superintelligence Lab
- Model Type: Transformer Decoder-only (Dense)
- Architecture: 32 layers, GQA (4 KV heads), 512 embedding dim
- Language(s): English
- License: MIT
- Training Tokens: 1 Billion (1B)
Training Environment
- Hardware: Single NVIDIA RTX 4090 (24GB)
- Training Time: 156.8 minutes (~2.6 hours)
- Optimizer: Muon & AdamW
- Precision: Automatic Mixed Precision (AMP)
Results
Final metrics after 1B tokens:
- Validation Loss: 3.1940
- Validation Accuracy: 40.19%
- Validation Perplexity: 24.38
Usage
This model is a base model trained on a mix of educational data. It demonstrates reasonable storytelling and factual knowledge for its size, but may hallucinate and is not yet fine-tuned for instruction following.
Historical Context
This model (151M parameters) reached similar complexity to OpenAI's original GPT-1 (117M) in under 3 hours on a single consumer GPU, showcasing the massive improvement in training efficiency in recent years.
Created by the Open Superintelligence Lab.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Datasets used to train vukrosic/Blueberry-Nano-151M
Evaluation results
- Validation Accuracy on FineWeb-Edu, Cosmopedia v2, Python-Edu mixself-reported0.402
- Validation Loss on FineWeb-Edu, Cosmopedia v2, Python-Edu mixself-reported3.194
- Validation Perplexity on FineWeb-Edu, Cosmopedia v2, Python-Edu mixself-reported24.380
