GPUburnout-3B-75K / README.md
GPUburnout's picture
Initial upload: GPUburnout-3B-75K pretrained base (75K steps, val loss 2.2475)
de423ce verified
metadata
license: apache-2.0
language:
  - en
tags:
  - llama
  - pretrained
  - from-scratch
  - gpuburnout
pipeline_tag: text-generation

GPUburnout-3B-75K

A 3.12 billion parameter Llama-style decoder-only transformer, pretrained from scratch as the Season 4 model in the GPUburnout blog series. Final pretraining checkpoint at 75,000 steps.

This is the base model, before any instruction tuning. For the chat-tuned versions, see GPUburnout/GPUburnout-3B-75K-Chat (the shipped SFT champion).

Model Details

  • Architecture: Llama-style decoder-only transformer
  • Parameters: 3.12B
  • Hidden size: 2,560
  • Intermediate size: 10,240
  • Layers: 32
  • Attention heads: 40 query, 10 key/value (GQA)
  • Head dim: 64
  • Vocab size: 32,005
  • Max position: 2,048
  • Tie word embeddings: True
  • Precision: float16

Training

  • Steps: 75,000
  • Final val loss: 2.2475
  • Training cost: ~$425 (RunPod A100/H200)
  • Pretraining data mix: FineWeb-Edu, FineMath, Stack-Edu-Python, PubMed abstracts
  • Optimizer: 8-bit AdamW
  • Hardware: Mixed A100 SXM 80GB and H200 NVL across training run

Benchmarks (0-shot, float16)

Benchmark Score
TruthfulQA MC2 47.61%
HellaSwag (acc_norm) 28.30%
ARC-Easy (acc_norm) 43.06%
ARC-Challenge (acc_norm) 21.84%
MMLU (5-shot acc) 23.02%

These are typical small-model-on-limited-tokens numbers. The model has not seen enough data to develop deep academic knowledge (MMLU is near random). It absorbed enough factual content for ARC-Easy to land ~2x random and for TruthfulQA to score above random on calibrated truthfulness, which is the strongest signal at this scale.

Related Models

Blog

This model is part of the GPUburnout LLM-from-scratch blog series. Season 4 documents the 3B build, including the platform pivot from Thunder to RunPod, MooseFS storage debugging, and the 75K-step pretraining run.

https://gpuburnout.com

License

Apache 2.0. Free to use, modify, redistribute. Attribution appreciated.