GPUburnout-1B
A 1.04 billion parameter Llama-style language model trained from scratch on 11.8B tokens for $175.
Model Details
- Architecture: Llama-style decoder-only transformer
- Parameters: 1.04B
- Hidden dim: 2048
- Layers: 16
- Attention: GQA (32 query heads, 8 KV heads)
- FFN: SwiGLU (intermediate 8192)
- Position encoding: RoPE (theta=500000)
- Context length: 2048 tokens
- Vocabulary: 32,005 tokens (BPE + 5 special tokens)
- Weight tying: Yes (embedding + LM head)
Training
- Data: 11.8B tokens (FineWeb-Edu 85%, Python-Edu 4.2%, FineMath 10.8%)
- Hardware: A100 SXM 80GB on RunPod
- Steps: 90,000
- Final loss: 2.494
- Total cost: ~$175 GPU compute
- Throughput: ~28,535 tokens/sec
Benchmarks (0-shot)
| Benchmark | Metric | Score | Random |
|---|---|---|---|
| ARC-Easy | acc | 47.1% | 25% |
| HellaSwag | acc_norm | 28.8% | 25% |
| ARC-Challenge | acc_norm | 23.3% | 25% |
| MMLU | acc | 23.0% | 25% |
Tokenizer
Includes ChatML special tokens for future SFT:
<|im_start|>(32000),<|im_end|>(32001)<|system|>(32002),<|user|>(32003),<|assistant|>(32004)
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("GPUburnout/GPUburnout-1B", torch_dtype="float16")
tokenizer = AutoTokenizer.from_pretrained("GPUburnout/GPUburnout-1B")
inputs = tokenizer("The capital of France is", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Blog
Full training journey documented at gpuburnout.com
Author
Jun Park (@GPUburnout)
- Downloads last month
- 62